Sequential probing of molecular targets based on pseudo-color barcodes with embedded error correction mechanism

ABSTRACT

The present invention, among other things, provides technologies for detecting and/or quantifying nucleic acids in cells, tissues, organs or organisms. Pre-designed barcodes are associated specific molecular targets through sequential hybridization experiments. A pseudo-color based barcoding scheme is developed to overcome limitations in the previous generation of the technology such as lack of visual signals that can be associated with the probes or small internal within cell when carrying out in situ experiments. The current method can be applied to both in vitro and in situ analysis. According to the method, each barcoding round comprises multiple serial hybridizations where a small number of colored signals (that are associated with probes) are used in each hybridization experiment within a serial hybridization round. Images from each serial hybridization experiment within the same serial hybridization round are combined to form a composite image for each barcoding round. In each barcoding round, the same set of molecular targets are analyzed. After all barcoding rounds are completed, associated of the barcode with these molecular targets is completed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. patent application Ser. No.15/225,820, filed on Aug. 1, 2016 and entitled “Multiplex Labeling ofMolecules by Sequential Hybridization Barcoding Using Probes WithCleavable Linkers,” U.S. patent application Ser. No. 15/298,219, filedon Oct. 19, 2016 and entitled “Error Correction of Multiplex ImagingAnalysis by Sequential Hybridization,” U.S. Patent ProvisionalApplication No. 62/428,910, filed on Dec. 1, 2016 and entitled “SingleMolecule Profiling Through Serial and Barcoded Hybridization,” U.S.Patent Provisional Application No. 62/456,291, filed on Feb. 8, 2017 andentitled “Imaging-based Transcriptomic and Translational Profiling of1000 Genes with in vitro seqFISH,” and U.S. Patent ProvisionalApplication No. 62/523,127, filed on Jun. 21, 2017 and entitled“Transcriptome Profiling of 10,000 mRNAs by RNA SPOTs,” each of which ishereby incorporated by reference herein in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under Grant No. HD075605and under Grant No. OD008530 awarded by the National Institutes ofHealth. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION

Transcription profiling of cells are essential for many purposes.Microscopy imaging which can resolve multiple mRNAs in single cells canprovide valuable information regarding transcript abundance andlocalization, which are important for understanding the molecular basisof cell identify and developing treatment for diseases. Molecularprofiling such as transcriptomic profiling of biological samples isessential for various purposes. For example, it would allow one toassess gene expression levels to detect and identify abnormal growthstates such as cancers. Using nucleic acid detection as an example,current nucleic acid-based assays such as qPCR and microarrays have beenuseful, but they do not reach single molecule sensitivity. Nextgeneration sequencing, on the other hand, involves amplification of thesample and reverse transcription of mRNA which can introduce biases andinaccuracies in quantification. Moreover, sample preparation andsequencing can be time-consuming and economically costly. Despite thefact that imaging has been used for mRNA transcripts quantification, itis limited to a few hundreds of genes. Many scientific questions requirethousands of genes or even the whole transcriptome to be quantified.

What is needed in the are better methods and systems for carrying outimaging based transcriptomic profiling at a single molecule sensitivitywith high accuracy in a time efficient manner.

SUMMARY OF THE INVENTION

The present invention provides certain insights into challenges ordefects associated with existing technologies for profiling transcriptsor DNA loci in cells, particularly for single cells. Moreover, thepresent invention provides new technologies for achieving effective suchprofiling, including of single cells. Provided technologies are broadlyuseful, including for example for profiling of isolated cells, cells intissues, cells in organs, and/or cells in organisms.

For example, the present invention provides the insight that existingtechnologies such as single cell RNA-seq or qPCR require single cells tobe isolated and put into multi-well format, which is a multiple stepprocess that can be cost prohibitive, labor intensive and prone toartifacts. Furthermore, the present invention recognizes that existingin situ sequencing technologies that use enzymatic reactions to convertthe mRNA into a DNA template first can be highly inefficient (forexample in the mRNA to DNA conversion process), so that, often, only asmall fraction of the RNAs are converted and detected. The presentinvention provides the particular insight that one major downside ofsuch low efficiency, which is estimated at 1% for RT and 10% for PLA, isthat it can introduce significant noise ad bias in the gene expressionmeasurements. The present invention further recognizes that existingspectral mRNA barcoding technologies that utilize single moleculefluorescence in situ hybridization (smFISH) require distinctfluorophores for scale up, and may be limited in the number of barcodesthat can be generated. smFISH also requires splitting probes intobarcoding subsets during hybridization. Because smFISH often uses two ormore colors for a target, it produces high density of objects in theimage, which can increase the complexity of data analysis.

Among other things, the present inventions provides new technologies forprofiling, for example, transcripts and/or DNA loci, that overcome oneor more or all of the problems associated with methods prior to thepresent invention. In some embodiments, the present invention providesmethods for detecting multiple targets, e.g., transcripts or DNA loci,in a cell through a sequential barcoding scheme that permitsmultiplexing of different targets.

In one aspect, disclosed herein is a method of barcoding moleculartargets. The method comprises: identifying N molecular targets in abiological sample, wherein the N molecular targets are immobilized;associating a unique barcode to each molecular target via n sequentialbarcoding rounds (where n≥2), wherein each barcoding round comprises mserial hybridizations of probes collectively bound to the N moleculartargets (where m≥2), and optionally removing probes between twobarcoding rounds. In some embodiments, each serial hybridization in turnfurther comprises: contacting one or more groups of probes to a subsetof the N molecular targets, the total number of groups of probescorresponding to the number of molecular targets in the subset, whereprobes in each group comprise one or more binding sequences specificallytargeting a molecular target in the subset, where each probe is capableof generating at least one detectable visual signal representing bindingof the probe to a molecular target in the subset, and where probes inthe one or more groups generate one or more different detectable visualsignals corresponding to the number of molecular targets in the subset;detecting the detectable visual signals that reflect the binding betweenthe one or more groups of probes and the subset of the N moleculartargets; and removing the visual signals, when applicable, prior to thenext serial hybridization; wherein the unique barcode to each moleculartarget consists of n components, each component is assigned from Sunique symbols, where S is an integer that equal to or greater than

$\sqrt[n]{N}.$

N and n are both integer.

In some embodiments, the detecting the detectable visual signalscomprises: capturing, for each serial hybridization round, an image ofthe detectable visual signals that reflect the binding between the oneor more groups of probes and the subset of the N molecular targets.

In some embodiments, the method further comprises: generating, for eachbarcoding round, a composite image by superimposing m imagescorresponding to the m serial hybridizations, wherein the m images arealigned based on one or more alignment references whose positions remainconstant relative to the biological sample.

In some embodiments, the method further comprises: applying Gaussiananalysis to super-localize the detectable visual signals in an image.

In some embodiments, the method further comprises: decoding thedetectable visual signals in each composite image based on the uniquebarcodes for the N molecular targets and the S unique symbols.

In some embodiments, the method further comprises: detecting referencevisual signals associated with the one or more alignment reference.

In some embodiments, the one or more alignment references comprise oneor more selected from the group consisting of an oligonucleotidesequence immobilized on the coverslips and detected by a complementaryoligo, a common sequence embedded in all probes, a microscopic object, ametal bead, a gold bead, a polystyrene bead, a PCR handle sequence on aprimary binding probe, and combinations thereof.

In some embodiments, the n sequential barcoding rounds includes x roundfor error correction, where x is an integer equal or greater than 1; andwherein assigning unique barcodes for each of N molecular targetsrequires S unique symbols, where S is an integer equal or greater than

$\sqrt[{n - x}]{N}.$

In some embodiments, the biological sample comprises a tissue sample, acell sample, a cell extract sample, a nucleic acid sample, an RNAtranscript sample, a protein sample, an mRNA sample, DNA molecules,protein molecules, RNA and DNA isoform molecules, single nucleotidepolymorphism molecules, or combinations thereof.

In some embodiments, the method further comprises: determining asecondary molecular target that are associated with the N moleculartargets by contacting the biological sample with molecules specificallybinding to the secondary molecular target.

In some embodiments, the secondary molecular target comprises oneselected from the group consisting of a RNA binding protein molecule,ribosome, a DNA binding protein molecule, a transcription factor, achromatin binding protein, a protein binding molecule, a scaffoldprotein, and combinations thereof.

In some embodiments, probes in the one or more groups of probes furthercomprise: one or more binding sequences each specifically targeting oneor more sites within a molecular target in the biological sample; and nunique readout sequences associated with the one or more bindingsequences, wherein, in each barcoding round, only one unique readoutsequence is associated with a detectable visual signal for a particularmolecular target.

In some embodiments, the one or more binding sequences target multipledifferent sites within the same molecular target. In some embodiments,the one or more binding sequences target multiple different sites withindifferent molecular targets.

In some embodiments, each probe comprises one or more of the n uniquereadout sequences.

In some embodiments, at least one of the n unique readout sequences islocated in an overhang sequence directly connected to the bindingsequence of a probe.

In some embodiments, at least one of the n unique readout sequences isindirectly connected to the binding sequence of a probe via one or moreintermediate molecules.

In some embodiments, the one or more intermediate molecules comprise anRNA bridge probe, a DNA bridge probe, a protein bridge probe, a probefor hybridization chain reaction (HCR), a hairpin nucleic acid probe, anHCR initiator, an HCR polymer, or combinations thereof.

In some embodiments, the one or more binding sequences specificallytarget one or more non-nucleic acid sites in the molecular target, andwherein the n unique readout sequences comprising nucleic acid sequencesthat are directly or indirectly connected to the binding sequences ofthe probes.

In some embodiments, the detectable visual signal is connected to thebinding sequence of a probe or an intermediate molecule via a cleavablelinker.

In some embodiments, the one or more binding sequences comprises apeptide sequence binding to a specific antigen within a particularmolecular target, an aptamer, or click chemistry group.

In some embodiments, the S unique symbols comprise colors, numbers,letters, shapes, or combinations thereof.

In some embodiments, for each serial hybridization, the one or moregroups of probes to a non-overlapping subset of the N molecular targets.

In one aspect, disclosed herein is a method of hybridization analysis ofbinding between labeled probes and molecular targets in a biologicalsample. The method comprises: generating multiple composite images oflabeled probes bound to a plurality of molecular targets in thebiological sample, wherein each composite image is generated from aplurality of images of labeled probes collectively bound to theplurality of molecular targets, wherein the plurality of moleculartargets are immobilized within the biological sample, and wherein eachimage of the plurality of images reveals: labeled probes bound to asubset of molecular targets within the plurality of molecular targets,wherein the labeled probes comprise one or more groups of probes, thetotal number of groups of probes corresponding to the number ofmolecular targets in the subset, wherein probes in each group compriseone or more binding sequences specifically targeting a molecular targetin the subset, and wherein each labeled probe is capable of generating avisual signal representing binding of the probe to a molecular target;and one or more reference targets whose positions remain constant in thebiological sample for aligning the plurality of images.

In some embodiments, the biological sample comprises a tissue sample, acell sample, a cell extract sample, a nucleic acid sample, an RNAtranscript sample, a protein sample, an mRNA sample, DNA molecules,protein molecules, RNA and DNA isoform molecules, single nucleotidepolymorphism molecules, or combinations thereof.

In some embodiments, in each image, the labelled probes bind to anon-overlapping subset of molecular targets within the plurality ofmolecular targets.

In some embodiments, the method further comprises: contacting the one ormore groups of probes with the subset of molecular targets of theplurality of molecular targets; detecting visual signals that reflectthe binding between the one or more groups of probes and moleculartargets in the subset; and removing the visual signals, when applicable,prior to a next round of hybridization of labeled probes binding to anew subset of molecular targets within the plurality of moleculartargets.

In some embodiments, the method further comprises: detecting referencevisual signals associated with the one or more alignment references.

In some embodiments, the method further comprises: aligning theplurality of images based on the positions of the one or more alignmentreferences.

In one aspect, disclosed herein is a sequential hybridization methodthat comprises the steps of: identifying a plurality of target genes;and associating, via sequential hybridization of binding probes to theplurality of target genes, a first plurality of unique codes with theplurality of target genes, where each target gene in the plurality oftarget genes is represented by a unique code in the first plurality ofunique codes, where the sequential hybridization comprises n rounds ofhybridization (where n≥2). Here, each round of hybridization in n roundsof hybridization in turn comprises the steps of contacting the pluralityof target genes with a plurality of binding probes, where each probe inthe plurality of binding probes comprises: a binding sequence thatspecifically binds a target sequence in a gene in the plurality oftarget genes, where target genes from the plurality of target genes arespatially transfixed from each other, and where each probe is capable ofemitting a detectable visual signal upon binding of the probe to atarget sequence; detecting visual signals that reflect the bindingbetween the plurality of binding probes and the plurality of targetgenes; and removing the visual signals, when applicable, prior to thenext round of hybridization. In some embodiments, probes used in the nrounds of hybridization are capable of emitting at least F types ofdetectable visual signals (where F≥2 and F^(n) is greater than thenumber of target genes in the plurality of target genes). In someembodiments, a unique code in the first plurality of unique codes for atarget gene consists of n components. In some embodiments, eachcomponent is determined by visual signals that reflect the bindingbetween binding probes and the target gene during one of the n rounds ofhybridization. In some embodiments, the n rounds of hybridizationinclude m error correction round (m≥1). In some embodiments, a secondplurality of unique codes for the plurality of target genes is generatedafter the m error correction round is removed from the n rounds ofhybridization. In some embodiments, each unique code in the secondplurality of unique codes consists of (n−m) components and uniquelyrepresents a target gene in the plurality of target genes.

In some embodiments, the plurality of target genes are located onimmobilized nucleic acids selected from the group consisting of mRNAs,chromosomal DNAs and combinations thereof. In some embodiments, n is 4or greater, 5 or greater, or 10 or greater. In some embodiments, the merror correction round comprises one round of the n rounds ofhybridization. In some embodiments, the one round of the n rounds ofhybridization is a repeat of one of the remaining one or more (n−1)rounds of the n rounds of hybridization. In some embodiments, wherem≤0.5n.

In some embodiments, the at least F types of detectable visual signalscomprises one selected from the group consisting of a fluorescencesignal, a color signal, a red signal, a green signal, a yellow signal, acombined color signal representing two or more colors, and combinationsthereof.

In some embodiments, a probe in the plurality of binding probes furthercomprises a signal moiety that emits a detectable visual signal uponbinding of the probe to a target sequence.

In some embodiments, the signal moiety is connected to the bindingsequence of the probe via a cleavable linker.

In some embodiments, each component of a n-component unique code in thefirst plurality of unique codes is assigned a numerical value thatcorresponds to one of the at least F types of detectable visual signals;and where at least one component of the n-component unique code isdetermined based on the numerical values of all or some of the other n−1components. In some embodiments, the n-component unique code isdetermined as:

{j ₁ ,j ₂, . . . (a ₁ *j ₁ +a ₂ *j ₂ . . . +a _(n) *j _(n) +C)mod F, . .. ,j _(n)},

where j₁ is a numerical value that corresponds the detectable visualsignals used in the first round of hybridization, j₂ is a numericalvalue that corresponds the detectable visual signals used in the secondround of hybridization, and j_(n) is a numerical value that correspondsthe detectable visual signals used in the nth round of hybridization;and where j₁, j₂, . . . j_(n), a₁, a₂, . . . a_(n) and n are none zerointegers and C is an integer.

In some embodiments, m, n, F, i, j and k are all integers.

In one aspect disclosed herein is a hybridization method that comprisesthe steps of: identifying a plurality of target genes; performingsequential hybridization of binding probes to the plurality of targetgenes, where the sequential hybridization comprises n rounds ofhybridization (where n≥2). Here, each round of hybridization in n roundsof hybridization in turn comprises: contacting the plurality of targetgenes with a plurality of binding probes, where each probe in theplurality of binding probes comprises: a binding sequence thatspecifically binds a target sequence in a gene in the plurality oftarget genes, where target genes from the plurality of target genes arespatially transfixed from each other, and where each probe is capable ofemitting a detectable visual signal upon binding of the probe to atarget sequence; detecting visual signals that reflect the bindingbetween the plurality of binding probes and the plurality of targetgenes, where each target gene in the plurality of target genes isrepresented by visual signals that are unique for the target gene, andwhere probes used in the n rounds of hybridization are capable ofemitting at least F types of detectable visual signals (where F≥2, andF^(n) is greater than the number of target genes in the plurality oftarget genes); and removing the visual signals, when applicable, priorto the next round of hybridization; and performing serial hybridizationsagainst one or more serial target genes, where the expression level ofeach serial target gene is above a predetermined threshold value, andwhere each serial hybridization in turn comprises: contacting the one ormore serial target genes with a plurality of binding probes, where eachprobe in the plurality of binding probes comprises: a binding sequencethat specifically binds a target sequence in a serial target gene in theone or more serial target genes, where one or more serial target genesare spatially transfixed from each other, where each probe is capable ofemitting a detectable visual signal upon binding of the probe to thetarget sequence, and where probes binding to target sequences in thesame serial target gene emit the same detectable visual signals; anddetecting visual signals that reflect the binding between the pluralityof binding probes and the one or more serial target gene.

In some embodiments, the n rounds of hybridization generate a firstplurality of unique codes, where each target gene in the plurality oftarget genes is represented by a unique code in the first plurality ofunique codes.

In some embodiments, where a unique code in the first plurality ofunique codes for a target gene consists of n components, and where eachcomponent is determined by visual signals that reflect the bindingbetween binding probes and the target gene during one of the n rounds ofhybridization.

In some embodiments, the n rounds of hybridization include m errorcorrection round (m≥1), where a second plurality of unique codes for theplurality of target genes is generated after the m error correctionround is removed from the n rounds of hybridization, and where eachunique code in the second plurality of unique codes consists of (n−m)components and uniquely represents a target gene in the plurality oftarget genes.

In some embodiments, the method of hybridization further comprises thestep of: identifying the one or more serial target genes based onexpression levels of candidate target genes.In some embodiments, the plurality of target genes are located onimmobilized nucleic acids selected from the group consisting of mRNAs,chromosomal DNAs and combinations thereof.

In some embodiments, the one or more serial target genes are located onimmobilized nucleic acids selected from the group consisting of mRNAs,chromosomal DNAs and combinations thereof.

In some embodiments, each unique code in the first plurality of uniquecodes consists of n component, where each component of a n-componentunique code in the first plurality of unique codes is assigned anumerical value that corresponds to one of the at least F types ofdetectable visual signals; and where at least one component of then-component unique code is determined based on the numerical values ofall or some of the other n−1 components. In some embodiments, then-component unique code is determined as:

{j ₁ ,j ₂, . . . (a ₁ *j ₁ +a ₂ *j ₂ . . . +a _(n) *j _(n) +C)mod F,j_(n)},

where j₁ is a numerical value that corresponds the detectable visualsignals used in the first round of hybridization, j₂ is a numericalvalue that corresponds the detectable visual signals used in the secondround of hybridization, and j_(n) is a numerical value that correspondsthe detectable visual signals used in the nth round of hybridization;and where j₁, j₂, . . . j_(n), a₁, a₂, . . . a_(n) are none zerointegers and C is an integer.

In one aspect, disclosed herein is a non-transitory computer-readablemedium containing instructions that, when executed by a computerprocessor, cause the computer processor to: associate, via sequentialhybridization of binding probes to a plurality of target genes, a firstplurality of unique codes with the plurality of target genes, where eachtarget gene in the plurality of target genes is represented by a uniquecode in the first plurality of unique codes, where the sequentialhybridization comprises n rounds of hybridization (where n≥2). Here eachround of hybridization in n rounds of hybridization in turn comprises:contacting the plurality of target genes with a plurality of bindingprobes, where each probe in the plurality of binding probes comprises: abinding sequence that specifically binds a target sequence in a gene inthe plurality of target genes, where target genes from the plurality oftarget genes are spatially transfixed from each other, and where eachprobe is capable of emitting a detectable visual signal upon binding ofthe probe to a target sequence; detecting visual signals that reflectthe binding between the plurality of binding probes and the plurality oftarget genes; and removing the visual signals, when applicable, prior tothe next round of hybridization.

In some embodiments, probes used in the n rounds of hybridization arecapable of emitting at least F types of detectable visual signals (whereF≥2 and F^(n) is greater than the number of target genes in theplurality of target genes). In some embodiments, a unique code in thefirst plurality of unique codes for a target gene consists of ncomponents. In some embodiments, each component is determined by visualsignals that reflect the binding between binding probes and the targetgene during one of the n rounds of hybridization. In some embodiments,the n rounds of hybridization include m error correction round (m≥1). Insome embodiments, a second plurality of unique codes for the pluralityof target genes is generated after the m error correction round isremoved from the n rounds of hybridization. In some embodiments, eachunique code in the second plurality of unique codes consists of (n−m)components and uniquely represents a target gene in the plurality oftarget genes.

In one aspect, disclosed herein is a non-transitory computer-readablemedium containing instructions that, when executed by a computerprocessor, cause the computer processor to: perform sequentialhybridization of binding probes to a plurality of target genes, wherethe sequential hybridization comprises n rounds of hybridization (wheren≥2).

Here, each round of hybridization in n rounds of hybridizationcomprises: contacting the plurality of target genes with a plurality ofbinding probes, where each probe in the plurality of binding probescomprises: a binding sequence that specifically binds a target sequencein a gene in the plurality of target genes, where target genes from theplurality of target genes are spatially transfixed from each other, andwhere each probe is capable of emitting a detectable visual signal uponbinding of the probe to a target sequence; detecting visual signals thatreflect the binding between the plurality of binding probes and theplurality of target genes, where each target gene in the plurality oftarget genes is represented by visual signals that are unique for thetarget gene, and where probes used in the n rounds of hybridization arecapable of emitting at least F types of detectable visual signals (whereF≥2, and F^(n) is greater than the number of target genes in theplurality of target genes); and removing the visual signals, whenapplicable, prior to the next round of hybridization; and perform serialhybridizations against one or more serial target genes, where theexpression level of each serial target gene is above a predeterminedthreshold value, where each serial hybridization comprises: contactingthe one or more serial target genes with a plurality of binding probes,where each probe in the plurality of binding probes comprises: a bindingsequence that specifically binds a target sequence in a serial targetgene in the one or more serial target genes, where one or more serialtarget genes are spatially transfixed from each other, where each probeis capable of emitting a detectable visual signal upon binding of theprobe to the target sequence, and where probes binding to targetsequences in the same serial target gene emit the same detectable visualsignals; and detecting visual signals that reflect the binding betweenthe plurality of binding probes and the one or more serial target gene.

In any of the embodiments disclosed herein, m, n, F, i, j and k are allintegers. Embodiments disclosed herein can be applied individually or incombination in any aspect disclosed herein.

In one aspect, disclosed herein is a composition comprising a pluralityof primary probes, a first plurality of bridge probes, and firstplurality of readout probes.

In some embodiments, each primary probe in the plurality of primaryprobes comprises: a primary binding sequence that binds to acomplementary target sequence in a target nucleic acid molecule, and afirst overhang sequence connected to one end of the primary bindingsequence.

In some embodiments, each bridge probe in the first plurality of bridgeprobes comprises a binding sequence that specifically binds to all or apart of the first overhang sequence of a primary probe of the pluralityof primary probes, and one or more readout binding targets connected inseries and linked to the binding sequence.

In some embodiments, each readout probe in the first plurality ofreadout probes comprises: a readout binding sequence that specificallybinds to a first readout binding target of the one or more readoutbinding targets of a bridge probe of the first plurality of bridgeprobes, and a signal moiety linked to the readout binding sequence via acleavable linker.

In these embodiments, the signal moiety is capable of emitting a firstdetectable visual signal upon binding of each readout probe from thefirst plurality of readout probes to the first readout binding target ofone of the one or more readout binding targets.

In some embodiments, the composition further comprises: a secondplurality of readout probes, wherein each readout probe comprises: areadout binding sequence that specifically binds to a second readoutbinding target of the one or more readout binding targets in a bridgeprobe of the first plurality of bridge probes, and a signal moietylinked to the readout binding sequence via a cleavable linker.

In these embodiments, the signal moiety is capable of emitting a seconddetectable visual signal upon binding of each readout probe from thesecond plurality of readout probes to the second readout binding targetof the one or more readout binding targets.

In some embodiments, the composition further comprises: a secondoverhang sequence, linked to the other end of the primary bindingsequence.

In some embodiments, the composition further comprises: a secondplurality of bridge probes, wherein each bridge probe comprises: abinding sequence that specifically binds to all or a part of the secondoverhang sequence of a primary probe of the plurality of primary probes,and one or more additional readout binding targets connected in seriesand linked to the binding sequence.

In some embodiments, the composition further comprises: a thirdplurality of readout probes, wherein each readout probe comprises: areadout binding sequence that specifically binds to a first additionalreadout binding target of the one or more additional readout bindingtargets in a bridge probe of the second plurality of bridge probes, anda signal moiety linked to the readout binding sequence via a cleavablelinker.

In these embodiments, the signal moiety is capable of emitting a thirddetectable visual signal upon binding of each readout probe from thethird plurality of readout probes to the first additional readoutbinding target of the one or more additional readout binding targets.

In some embodiments, the composition further comprises: a fourthplurality of readout probes. Each readout probe in the fourth pluralityof readout probes comprises: a readout binding sequence thatspecifically binds to a second additional readout binding target of theone or more additional readout binding targets in a bridge probe of thesecond plurality of bridge probes, and a signal moiety linked to thereadout binding sequence via a cleavable linker.

In these embodiments, the signal moiety is capable of emitting a fourthdetectable visual signal upon binding of each readout probe from thefourth plurality of readout probes to the second additional readoutbinding target of the one or more additional readout binding targets.

In some embodiments, the cleavable linker is selected from the groupconsisting of an enzyme cleavable linker, a nucleophile/base sensitivelinker, reduction sensitive linker, a photo-cleavable linker, anelectrophile/acid sensitive linker, a metal-assisted cleavable linker,and an oxidation sensitive linker.

In some embodiments, the cleavable linker is a disulfide bond or anucleic acid restriction site. In some embodiments, the one or morereadout binding targets comprises three or more readout binding targets.

In some embodiments where second overhang is present, the additional oneor more readout binding targets comprises three or more readout bindingtargets.

In one aspect, disclosed herein is a sequential hybridization methodutilizing a plurality of primary probes, a first plurality of bridgeprobes, and first plurality of readout probes. In some embodiments, themethod comprises the steps of: a) contacting a target nucleic acidmolecule with a plurality of primary probes, where each primary probecomprises: a primary binding sequence that binds to a complementarytarget sequence within the target nucleic acid molecule, and a firstoverhang sequence connected to one end of the primary binding sequence;b) contacting, after step a) the target nucleic acid molecule with afirst plurality of bridge probes, where each bridge probe comprises: abinding sequence that specifically binds to all or a part of the firstoverhang sequence of a primary probe of the plurality of primary probes,and one or more readout binding targets connected in series and linkedto the binding sequence; and c) contacting, after step b) the targetnucleic acid molecule with a first plurality of readout probes, whereineach readout probe comprises: a readout binding sequence thatspecifically binds to a first readout binding target of the one or morereadout binding targets of a primary probe of the plurality of primaryprobes, and a signal moiety linked to the readout binding sequence via acleavable linker.

In these embodiments, the signal moiety is capable of emitting a firstdetectable visual signal upon binding of each readout probe from thefirst plurality of readout probes to the first readout binding target ofthe one or more readout binding targets of a bridge probe of the firstplurality of bridge probes.

In some embodiments, the method further comprises the steps of: c1)imaging the target nucleic acid molecule after step c) so thatinteractions between the first plurality of readout probes and the firstreadout binding target of the one or more readout binding targets of aprimary bridge probe are detected by the presence of first detectablevisual signal; and c2) applying, after step c1) a cleaving agent tocleave the linker, thereby eliminating the signal moiety from eachreadout probe in the first plurality of readout probes.

In some embodiments, the method further comprises: d) contacting, afterstep c), the target nucleic acid molecule with a second plurality ofreadout probes. Each readout probe comprises: a readout binding sequencethat specifically binds to a second readout binding target of the one ormore readout binding targets of a bridge probe, and a signal moietylinked to the readout binding sequence via a cleavable linker.

In these embodiments, the signal moiety is capable of emitting a seconddetectable visual signal upon binding of each readout probe from thesecond plurality of readout probes to the second readout binding targetof the one or more readout binding targets of a bridge probe of thefirst plurality of bridge probes.

In some embodiments, the method further comprises: d1) imaging thetarget nucleic acid molecule after step d) so that interactions betweenthe second plurality of readout probes and the second readout bindingtarget of the one or more readout binding targets of a bridge probe aredetected by the presence of second detectable visual signal; and d2)applying a cleaving agent to cleave the linker, thereby eliminating thesignal moiety from each readout probe in the second plurality of readoutprobes.

In some embodiments, each primary probe in the plurality of primaryprobes further comprises: a second overhang sequence connected to theother end of the primary binding sequence.

In some embodiments, the method further comprises: e) contacting, afterstep d), the target nucleic acid molecule with a second plurality ofbridge probes. Each bridge probe comprises: a binding sequence thatspecifically binds to all or a part of the second overhang sequence of aprimary probe of the plurality of primary probes, and one or moreadditional readout binding targets connected in series and linked to thebinding sequence.

In some embodiments, the method further comprises: f) contacting, afterstep e), the target nucleic acid molecule with a third plurality ofreadout probes. Each readout probe comprises: a readout binding sequencethat specifically binds to a first additional readout binding target ofthe one or more additional readout binding targets of a bridge probe inthe second plurality of bridge probes, and a signal moiety linked to thereadout binding sequence via a cleavable linker.

In these embodiments, the signal moiety is capable of emitting a thirddetectable visual signal upon binding of each readout probe from thethird plurality of readout probes to the first additional readoutbinding target of the one or more additional readout binding targets.

In some embodiments, the method further comprises: f1) imaging thetarget nucleic acid molecule after step f) so that interactions betweenthe third plurality of readout probes and the first additional readoutbinding target of the one or more additional readout binding targets ofa bridge probe in the second plurality of bridge probes are detected bythe presence of the third detectable visual signal; and f2) applying acleaving agent to cleave the linker, thereby eliminating the signalmoiety from each readout probe in the third plurality of readout probes.

In some embodiments, the method further comprises: g) contacting, afterstep f), the target nucleic acid molecule with a fourth plurality ofreadout probes. Each readout probe comprises: a readout binding sequencethat specifically binds to a second additional readout binding target ofthe one or more additional readout binding targets of a bridge probe inthe second plurality of bridge probes, and a signal moiety linked to thereadout binding sequence via a cleavable linker.

In these embodiments, the signal moiety is capable of emitting a fourthdetectable visual signal upon binding of each readout probe from thefourth plurality of readout probes to the second additional readoutbinding target of the one or more additional readout binding targets.

In some embodiments, the method further comprises: h1) imaging thetarget nucleic acid molecule after step g) so that interactions betweenthe fourth plurality of readout probes and the second additional readoutbinding target of the one or more additional readout binding targets ofa bridge probe in the second plurality of bridge probes are detected bythe presence of the fourth detectable visual signal; and h2) applying acleaving agent to cleave the linker, thereby eliminating the signalmoiety from each readout probe in the fourth plurality of readoutprobes.

In some embodiments, the target nucleic acid molecule is an mRNA or aDNA. In some embodiments, the target nucleic acid molecule is within anintact mammalian cell. In some embodiments, the intact mammalian cell isa human cell.

In these embodiments, the cleavable linker is selected from the groupconsisting of an enzyme cleavable linker, a nucleophile/base sensitivelinker, reduction sensitive linker, a photo-cleavable linker, anelectrophile/acid sensitive linker, a metal-assisted cleavable linker,and an oxidation sensitive linker. In these embodiments, the cleavablelinker is a disulfide bond or a nucleic acid restriction site. In theseembodiments, the one or more readout binding targets comprises three ormore readout binding targets.

In these embodiments where a second overhang is present, the additionalone or more readout binding targets comprises three or more readoutbinding targets.

In one aspect, disclosed herein is a composition that comprises aplurality of primary probes and a first plurality of readout probes. Inthese embodiments, each primary probe comprises: a primary bindingsequence that binds to a complementary target sequence in a targetnucleic acid molecule, and a first overhang sequence connected to oneend of the primary binding sequence, wherein the first overhang sequencecomprises one or more readout binding targets connected in series. Alsoin these embodiments, each readout probe comprises: a readout bindingsequence that specifically binds to a first readout binding target ofthe one or more readout binding targets in a first overhang sequence,and a signal moiety linked to the readout binding sequence via acleavable linker. In these embodiments, the signal moiety is capable ofemitting a first detectable visual signal upon binding of each readoutprobe from the first plurality of readout probes to the first readoutbinding target of one of the one or more readout binding targets.

In some embodiments, the composition further comprises: a secondplurality of readout probes, where each readout probe comprises: areadout binding sequence that specifically binds to a second readoutbinding target of the one or more readout binding targets in a firstoverhang sequence, and a signal moiety linked to the readout bindingsequence via a cleavable linker. In these embodiments, the signal moietyis capable of emitting a second detectable visual signal upon binding ofeach readout probe from the second plurality of readout probes to thesecond readout binding target of the one or more readout bindingtargets.

In some embodiments, a primary probe further comprises: a secondoverhang sequence, linked to the other end of the primary bindingsequence, where the second overhang sequence comprises one or moreadditional readout binding targets connected in series.

In some embodiments, the composition further comprises a third pluralityof readout probes, where each readout probe comprises: a readout bindingsequence that specifically binds to a first additional readout bindingtarget of the one or more additional readout binding targets in a secondoverhang sequence, and a signal moiety linked to the readout bindingsequence via a cleavable linker. In these embodiments, the signal moietyis capable of emitting a third detectable visual signal upon binding ofeach readout probe from the third plurality of readout probes to thefirst additional readout binding target of the one or more additionalreadout binding targets.

In some embodiments, the composition further comprises a fourthplurality of readout probes, where each readout probe comprises: areadout binding sequence that specifically binds to a second additionalreadout binding target of the one or more additional readout bindingtargets in a second overhang sequence, and a signal moiety linked to thereadout binding sequence via a cleavable linker. In these embodiments,the signal moiety is capable of emitting a fourth detectable visualsignal upon binding of each readout probe from the fourth plurality ofreadout probes to the second additional readout binding target of theone or more additional readout binding targets.

In any embodiments disclosed herein, the cleavable linker is selectedfrom the group consisting of an enzyme cleavable linker, anucleophile/base sensitive linker, reduction sensitive linker, aphoto-cleavable linker, an electrophile/acid sensitive linker, ametal-assisted cleavable linker, and an oxidation sensitive linker.

In any embodiments disclosed herein, the cleavable linker is a disulfidebond or a nucleic acid restriction site.

In any embodiments disclosed herein, the one or more readout bindingtargets comprises three or more readout binding targets.

In embodiments where a second overhang sequence is present, theadditional one or more readout binding targets comprises three or morereadout binding targets.

In some embodiments, the target nucleic acid molecule is an mRNA or aDNA. In some embodiments, the target nucleic acid molecule is within anintact mammalian cell. In some embodiments, the intact mammalian cell isa human cell.

In one aspect, disclosed herein is a sequential hybridization methodutilizing with a plurality of primary probes and a first plurality ofreadout probes. The method comprises the steps of: a) contacting atarget nucleic acid molecule with a plurality of primary probes. Eachprimary probe comprises: a primary binding sequence that binds to acomplementary target sequence within the target nucleic acid molecule,and a first overhang sequence connected to one end of the primarybinding sequence, wherein the first overhang sequence comprises one ormore readout binding targets connected in series; and b) contacting,after step a) the target nucleic acid molecule with a first plurality ofreadout probes. Each readout probe comprises: a readout binding sequencethat specifically binds to a first readout binding target of the one ormore readout binding targets of a primary probe of the plurality ofprimary probes, and a signal moiety linked to the readout bindingsequence via a cleavable linker.

In these embodiments, the signal moiety is capable of emitting a firstdetectable visual signal upon binding of each readout probe from thefirst plurality of readout probes to the first readout binding target ofone of the one or more readout binding targets.

In some embodiments, the method further comprises the steps of: b1)imaging the target nucleic acid molecule after step b) so thatinteractions between the first plurality of readout probes and the firstreadout binding target of the one or more readout binding targets of aprimary bridge probe are detected by the presence of the firstdetectable visual signal; and b2) applying a cleaving agent to cleavethe linker, thereby eliminating the signal moiety from each readoutprobe in the first plurality of readout probes.

In some embodiments, the method further comprises the steps of: c)contacting, after step b), the target nucleic acid molecule with asecond plurality of readout probes. Each readout probe comprises: areadout binding sequence that specifically binds to a second readoutbinding target of the one or more readout binding targets of a primaryprobe, and a signal moiety linked to the readout binding sequence via acleavable linker.

In these embodiments, the signal moiety is capable of emitting a seconddetectable visual signal upon binding of each readout probe from thesecond plurality of readout probes to the second readout binding targetof the one or more readout binding targets.

In some embodiments, the method further comprises the steps of: c1)imaging the target nucleic acid molecule after step c) so thatinteractions between the second plurality of readout probes and thesecond readout binding target of the one or more readout binding targetsof a primary probe are detected by the presence of the second detectablevisual signal; and c2) applying a cleaving agent to cleave the linker,thereby eliminating the signal moiety from each readout probe in thesecond plurality of readout probes.

In some embodiments, each primary probe in the plurality of primaryprobes further comprises: a second overhang sequence connected to theother end of the primary binding sequence, wherein the second overhangsequence comprises one or more additional readout binding targetsconnected in series.

In some embodiments, the method further comprises the steps of: d)contacting, after step c), the target nucleic acid molecule with a thirdplurality of readout probes. Each readout probe comprises: a readoutbinding sequence that specifically binds to a first additional readoutbinding target of the one or more additional readout binding targets ofa primary probe, and a signal moiety linked to the readout bindingsequence via a cleavable linker.

In these embodiments, the signal moiety is capable of emitting a thirddetectable visual signal upon binding of each readout probe from thethird plurality of readout probes to the first additional readoutbinding target of the one or more additional readout binding targets.

In some embodiments, the method further comprises the steps of: d1)imaging the target nucleic acid molecule after step d) so thatinteractions between the second plurality of readout probes and thesecond readout binding target of the one or more readout binding targetsof a primary probe are detected by the presence of the second detectablevisual signal; and d2) applying a cleaving agent to cleave the linker,thereby eliminating the signal moiety from each readout probe in thesecond plurality of readout probes.

In some embodiments, the method further comprises the steps of: e)contacting, after step d), the target nucleic acid molecule with afourth plurality of readout probes. Each readout probe comprises: areadout binding sequence that specifically binds to a second additionalreadout binding target of the one or more additional readout bindingtargets of a primary probe, and a signal moiety linked to the readoutbinding sequence via a cleavable linker,

In these embodiments, the signal moiety is capable of emitting a fourthdetectable visual signal upon binding of each readout probe from thefourth plurality of readout probes to the second additional readoutbinding target of the one or more additional readout binding targets.

In some embodiments, the method further comprises the steps of: e1)imaging the mRNA after step d) so that interactions between the fourthplurality of readout probes and the second additional readout bindingtarget of the one or more additional readout binding targets of aprimary probe are detected by the presence of the fourth detectablevisual signal; and e2) applying a cleaving agent to cleave the linker,thereby eliminating the signal moiety from each readout probe in thefourth plurality of readout probes.

In some embodiments, the target nucleic acid molecule is an mRNA or aDNA. In some embodiments, the target nucleic acid molecule is within anintact mammalian cell. In some embodiments, the intact mammalian cell isa human cell.

In some embodiments, the cleavable linker is selected from the groupconsisting of an enzyme cleavable linker, a nucleophile/base sensitivelinker, reduction sensitive linker, a photo-cleavable linker, anelectrophile/acid sensitive linker, a metal-assisted cleavable linker,and an oxidation sensitive linker. In some embodiments, the cleavablelinker is a disulfide bond or a nucleic acid restriction site.

In some embodiments, the one or more readout binding targets comprisesthree or more readout binding targets.

In some embodiments where the second overhang sequence is present, theadditional one or more readout binding targets comprises three or morereadout binding targets.

In one aspect, disclosed herein is a composition comprising a firstplurality of nucleic acid detection probes and an extendible signalmotif formed by a first plurality populations of extender probes {EP₁,EP₂, . . . , EP_(n)}. In some embodiments, each nucleic acid detectionprobe in the first plurality of nucleic acid detection probes comprises:a binding region comprising a binding sequence that binds to a firsttarget sequence; and an initiator sequence linked to the binding regionwith a cleavable linker. In some embodiments, each population ofextender probes is represented by EP₁, EP₂, . . . , EP_(n),respectively, where each extender probe in EP₁ comprises: a bindingsequence that binds to all or a part of the initiator sequence; one ormore target sequences for extender probes in EP₂ and subsequentpopulations of extender probes, and a signal moiety capable of emittinga first detectable signal. In some embodiments, each probe in EP₂ andsubsequent populations of extender probes comprises: a binding sequencethat binds to all or a part of the previous extender sequence; one ormore target sequences for probes in subsequent populations of extenderprobes; and a signal moiety capable of emitting the first detectablesignal.

In some embodiments, the first target sequence is within a primary probethat directly binds to a target nucleic acid molecule. In someembodiments, the first target sequence is within a secondary probe thatbinds to a primary probe that directly binds to a target nucleic acidmolecule. In some embodiments, the first target sequence is within atertiary probe that binds to a secondary probe that binds to a primaryprobe that directly binds to a target nucleic acid molecule.

In some embodiments, the target nucleic acid molecule is an mRNA or aDNA. In some embodiments, the target nucleic acid molecule is within anintact mammalian cell. In some embodiments, the intact mammalian cell isa human cell.

In some embodiments, the cleavable linker is selected from the groupconsisting of an enzyme cleavable linker, a nucleophile/base sensitivelinker, reduction sensitive linker, a photo-cleavable linker, anelectrophile/acid sensitive linker, a metal-assisted cleavable linker,and an oxidation sensitive linker. In some embodiments, the cleavablelinker is a disulfide bond or a nucleic acid restriction site.

In some embodiments, each extender probe of the plurality of extenderprobes comprises a binding sequence that is complementary to all or apart of the initiator sequence in the nucleic acid detection probe,wherein each extender probe forms a hairpin structure, and wherein thepresence of the initiator sequence causes the hairpin structure tounfold and initiates a hybridization chain reaction.

In some embodiments, the composition further comprises a secondplurality of nucleic acid detection probes and an extendible signalmotif formed by a second plurality populations of extender probes{EP_(1′), EP_(2′), . . . , EP_(n′)}. In some embodiments, each nucleicacid detection probe in the second plurality of nucleic acid detectionprobes comprises: a binding region comprising a binding sequence thatbinds to a second target sequence; and an initiator sequence linked tothe binding region with a cleavable linker. In some embodiments, eachpopulation of extender probes is represented by EP_(1′), EP_(2′), . . ., EP_(n′), respectively, wherein each extender probe in EP_(1′)comprises: a binding sequence that binds to all or a part of theinitiator sequence; one or more target sequences for extender probes inEP_(2′) and subsequent populations of extender probes; and a signalmoiety capable of emitting a second detectable signal. In someembodiments, each probe in EP_(2′) and subsequent populations ofextender probes comprises: a binding sequence that binds to all or apart of the previous extender sequence; one or more target sequences forprobes in subsequent populations of extender probes; and a signal moietycapable of emitting the second detectable signal.

In one aspect, disclosed herein is a sequential hybridization method.The method comprises the steps of: a) contacting a target nucleic acidmolecule with a first plurality of nucleic acid detection probes and b)contacting, after step a) the target nucleic acid molecule with a firstplurality populations of extender probes {EP₁, EP₂, , EP_(n)}. In someembodiments, each nucleic acid detection probe in the first plurality ofnucleic acid detection probes comprises: a binding region comprising abinding sequence that binds to a first target sequence; and an initiatorsequence linked to the binding region with a cleavable linker. In someembodiments, each population of extender probes is represented by EP₁,EP₂, . . . , EP_(n), respectively, where each extender probe in EP₁comprises: a binding sequence that binds to all or a part of theinitiator sequence; one or more target sequences for extender probes inEP₂ and subsequent populations of extender probes; and a signal moietycapable of emitting a first detectable signal. In some embodiments, eachprobe in EP₂ and subsequent populations of extender probes comprises: abinding sequence that binds to all or a part of the previous extendersequence; one or more target sequences for probes in subsequentpopulations of extender probes; and a signal moiety capable of emittingthe first detectable signal.

In some embodiments, the method further comprises: b1) imaging thetarget nucleic acid molecule after step b) so that interactions betweenthe first plurality of nucleic acid detection probes and first targetsequences are detected by the presence of the first detectable visualsignal; and b2) applying a cleaving agent to cleave the linker, therebyeliminating the extendible signal motif.

In some embodiments, the method further comprises: c) contacting antarget nucleic acid molecule with a second plurality of nucleic aciddetection probes. In some embodiment, each nucleic acid detection probein the second plurality of nucleic acid detection probes comprises: abinding region comprising a binding sequence that binds to a secondtarget sequence; and an initiator sequence linked to the binding regionwith a cleavable linker.

In some embodiments, the method further comprises: d) contacting, afterstep c) the target nucleic acid molecule with a second pluralitypopulations of extender probes {EP_(1′), EP_(2′), . . . , EP_(n′)},where each population of extender probes is represented by EP_(1′),EP_(2′), . . . , and EP_(n′), respectively. In some embodiments, eachextender probe in EP_(1′) comprises: a binding sequence that binds toall or a part of the initiator sequence; one or more target sequencesfor extender probes in EP_(2′) and subsequent populations of extenderprobes; and a signal moiety capable of emitting a second detectablesignal. In some embodiments, each probe in EP_(2′) and subsequentpopulations of extender probes comprises: a binding sequence that bindsto all or a part of the previous extender sequence; one or more targetsequences for probes in subsequent populations of extender probes; and asignal moiety capable of emitting the second detectable signal.

In some embodiments, the method further comprises: d1) imaging thetarget nucleic acid molecule after step d) so that interactions betweenthe second plurality of nucleic acid detection probes and second targetsequences are detected by the presence of the second detectable visualsignal; and d2) applying a cleaving agent to cleave the linker, therebyeliminating the extendible signal motif.

In some embodiments, the second target sequence is within a primaryprobe that directly binds to a target nucleic acid molecule. In someembodiments, the second target sequence is within a secondary probe thatbinds to a primary probe that directly binds to a target nucleic acidmolecule. In some embodiments, the second target sequence is within atertiary probe that binds to a secondary probe that binds to a primaryprobe that directly binds to a target nucleic acid molecule.

In some embodiments, the target nucleic acid molecule is an mRNA or aDNA. In some embodiments, the target nucleic acid molecule is within anintact mammalian cell. In some embodiments, the intact mammalian cell isa human cell.

In some embodiments, the cleavable linker is selected from the groupconsisting of an enzyme cleavable linker, a nucleophile/base sensitivelinker, reduction sensitive linker, a photo-cleavable linker, anelectrophile/acid sensitive linker, a metal-assisted cleavable linker,and an oxidation sensitive linker. In some embodiments, the cleavablelinker is a disulfide bond or a nucleic acid restriction site.

In some embodiments, each extender probe of the plurality of extenderprobes comprises a binding sequence that is complementary to all or apart of the initiator sequence in the nucleic acid detection probe,where each extender probe forms a hairpin structure, and where thepresence of the initiator sequence causes the hairpin structure tounfold and initiates a hybridization chain reaction.

The compositions and methods disclosed herein can be used in sequentialhybridizations to identify any suitable cellular targets within anintact cell or in an in vitro setting. In some embodiments, the cellulartargets can be mRNAs or DNAs. In some embodiments, the cellular targetscan be proteins. For example, the initial target-binding primary probecan be an antibody conjugated with nucleic acid sequence for subsequentbindings.

One of skill in the art would understand that embodiments disclosedherein can be applied or combined in any aspect when applicable.

Definitions

Animal: As used herein, the term “animal” refers to any member of theanimal kingdom. In some embodiments, “animal” refers to humans, at anystage of development. In some embodiments, “animal” refers to non-humananimals, at any stage of development. In certain embodiments, thenon-human animal is a mammal (e.g., a rodent, a mouse, a rat, a rabbit,a monkey, a dog, a cat, a sheep, cattle, a primate, and/or a pig). Insome embodiments, animals include, but are not limited to, mammals,birds, reptiles, amphibians, fish, and/or worms. In some embodiments, ananimal may be a transgenic animal, a genetically-engineered animal,and/or a clone.

Approximately: As used herein, the terms “approximately” or “about” inreference to a number are generally taken to include numbers that fallwithin a range of 5%, 10%, 15%, or 20% in either direction (greater thanor less than) of the number unless otherwise stated or otherwise evidentfrom the context (except where such number would be less than 0% orexceed 100% of a possible value). In some embodiments, use of the term“about” in reference to dosages means±5 mg/kg/day.

Homology: “Homology” or “identity” or “similarity” refers to sequencesimilarity between two nucleic acid molecules. Homology and identity caneach be determined by comparing a position in each sequence which can bealigned for purposes of comparison. When an equivalent position in thecompared sequences is occupied by the same base, then the molecules areidentical at that position; when the equivalent site occupied by thesame or a similar nucleic acid residue (e.g., similar in steric and/orelectronic nature), then the molecules can be referred to as homologous(similar) at that position. Expression as a percentage ofhomology/similarity or identity refers to a function of the number ofidentical or similar nucleic acids at positions shared by the comparedsequences. A sequence which is “unrelated” or “non-homologous” sharesless than 40% identity, less than 35% identity, less than 30% identity,or less than 25% identity with a sequence described herein. In comparingtwo sequences, the absence of residues (amino acids or nucleic acids) orpresence of extra residues also decreases the identity andhomology/similarity.

In some embodiments, the term “homology” describes a mathematicallybased comparison of sequence similarities which is used to identifygenes with similar functions or motifs. The nucleic acid sequencesdescribed herein can be used as a “query sequence” to perform a searchagainst public databases, for example, to identify other family members,related sequences or homologs. In some embodiments, such searches can beperformed using the NBLAST and)(BLAST programs (version 2.0) ofAltschul, et al. (1990) J. Mol. Biol. 215:403-10. In some embodiments,BLAST nucleotide searches can be performed with the NBLAST program,score=100, word length=12 to obtain nucleotide sequences homologous tonucleic acid molecules of the invention. In some embodiments, to obtaingapped alignments for comparison purposes, Gapped BLAST can be utilizedas described in Altschul et al., (1997) Nucleic Acids Res.25(17):3389-3402. When utilizing BLAST and Gapped BLAST programs, thedefault parameters of the respective programs (e.g.,)(BLAST and BLAST)can be used (See www.ncbi.nlm.nih.gov).

Identity: As used herein, “identity” means the percentage of identicalnucleotide residues at corresponding positions in two or more sequenceswhen the sequences are aligned to maximize sequence matching, i.e.,taking into account gaps and insertions. Identity can be readilycalculated by known methods, including but not limited to thosedescribed in (Computational Molecular Biology, Lesk, A. M., ed., OxfordUniversity Press, New York, 1988; Biocomputing: Informatics and GenomeProjects, Smith, D. W., ed., Academic Press, New York, 1993; ComputerAnalysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G.,eds., Humana Press, New Jersey, 1994; Sequence Analysis in MolecularBiology, von Heinje, G., Academic Press, 1987; and Sequence AnalysisPrimer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York,1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073(1988). Methods to determine identity are designed to give the largestmatch between the sequences tested. Moreover, methods to determineidentity are codified in publicly available computer programs. Computerprogram methods to determine identity between two sequences include, butare not limited to, the GCG program package (Devereux, J., et al.,Nucleic Acids Research 12(1): 387 (1984)), BLASTP, BLASTN, and FASTA(Altschul, S. F. et al., J. Molec. Biol. 215: 403-410 (1990) andAltschul et al. Nuc. Acids Res. 25: 3389-3402 (1997)). The BLAST Xprogram is publicly available from NCBI and other sources (BLAST Manual,Altschul, S., et al., NCBI NLM NIH Bethesda, Md. 20894; Altschul, S., etal., J. Mol. Biol. 215: 403-410 (1990). The well-known Smith Watermanalgorithm can also be used to determine identity.

In vitro: As used herein, the term “in vitro” refers to events thatoccur in an artificial environment, e.g., in a test tube or reactionvessel, in cell culture, etc., rather than within an organism (e.g.,animal, plant, and/or microbe).

In vivo: As used herein, the term “in vivo” refers to events that occurwithin an organism (e.g., animal, plant, and/or microbe).

Oligonucleotide: the term “oligonucleotide” refers to a polymer oroligomer of nucleotide monomers, containing any combination ofnucleobases, modified nucleobases, sugars, modified sugars, phosphatebridges, or modified bridges.

Oligonucleotides of the present invention can be of various lengths. Inparticular embodiments, oligonucleotides can range from about 2 to about200 nucleotides in length. In various related embodiments,oligonucleotides, single-stranded, double-stranded, and triple-stranded,can range in length from about 4 to about 10 nucleotides, from about 10to about 50 nucleotides, from about 20 to about 50 nucleotides, fromabout 15 to about 30 nucleotides, from about 20 to about 30 nucleotidesin length. In some embodiments, the oligonucleotide is from about 9 toabout 39 nucleotides in length. In some embodiments, the oligonucleotideis at least 4 nucleotides in length. In some embodiments, theoligonucleotide is at least 5 nucleotides in length. In someembodiments, the oligonucleotide is at least 6 nucleotides in length. Insome embodiments, the oligonucleotide is at least 7 nucleotides inlength. In some embodiments, the oligonucleotide is at least 8nucleotides in length. In some embodiments, the oligonucleotide is atleast 9 nucleotides in length. In some embodiments, the oligonucleotideis at least 10 nucleotides in length. In some embodiments, theoligonucleotide is at least 11 nucleotides in length. In someembodiments, the oligonucleotide is at least 12 nucleotides in length.In some embodiments, the oligonucleotide is at least 15 nucleotides inlength. In some embodiments, the oligonucleotide is at least 20nucleotides in length. In some embodiments, the oligonucleotide is atleast 25 nucleotides in length. In some embodiments, the oligonucleotideis at least 30 nucleotides in length. In some embodiments, theoligonucleotide is a duplex of complementary strands of at least 18nucleotides in length. In some embodiments, the oligonucleotide is aduplex of complementary strands of at least 21 nucleotides in length.

Predetermined: By predetermined is meant deliberately selected, forexample as opposed to randomly occurring or achieved. A composition thatmay contain certain individual oligonucleotides because they happen tohave been generated through a process that cannot be controlled tointentionally generate the particular oligonucleotides is not a“predetermined” composition. In some embodiments, a predeterminedcomposition is one that can be intentionally reproduced (e.g., throughrepetition of a controlled process).

Probe: As used herein, the term “probe” or “probes” refers to anymolecules, synthetic or naturally occurring, that can attach themselvesdirectly or indirectly to a molecular target (e.g., an mRNA sample, DNAmolecules, protein molecules, RNA and DNA isoform molecules, singlenucleotide polymorphism molecules, and etc.). For example, a probe caninclude an nucleic acid molecule, an oligonucleotide, a protein (e.g.,an antibody or an antigen binding sequence), or combinations thereof.For example, a protein probe may be connected with one or more nucleicacid molecules to for a probe that is a chimera. As disclosed herein, insome embodiments, a probe itself can produce a detectable signal. Insome embodiments, a probe is connected, directly or indirectly via anintermediate molecule, with a signal moiety (e.g., a dye or fluorophore)that can produce a detectable signal.

Sample: As used herein, the term “sample” refers to a biological sampleobtained or derived from a source of interest, as described herein. Insome embodiments, a source of interest comprises an organism, such as ananimal or human. In some embodiments, a biological sample comprisesbiological tissue or fluid. In some embodiments, a biological sample isor comprises bone marrow; blood; blood cells; ascites; tissue or fineneedle biopsy samples; cell-containing body fluids; free floatingnucleic acids; sputum; saliva; urine; cerebrospinal fluid, peritonealfluid; pleural fluid; feces; lymph; gynecological fluids; skin swabs;vaginal swabs; oral swabs; nasal swabs; washings or lavages such as aductal lavages or broncheoalveolar lavages; aspirates; scrapings; bonemarrow specimens; tissue biopsy specimens; surgical specimens; feces,other body fluids, secretions, and/or excretions; and/or cellstherefrom, etc. In some embodiments, a biological sample is or comprisescells obtained from an individual. In some embodiments, a sample is a“primary sample” obtained directly from a source of interest by anyappropriate means. For example, in some embodiments, a primarybiological sample is obtained by methods selected from the groupconsisting of biopsy (e.g., fine needle aspiration or tissue biopsy),surgery, collection of body fluid (e.g., blood, lymph, feces etc.), etc.In some embodiments, as will be clear from context, the term “sample”refers to a preparation that is obtained by processing (e.g., byremoving one or more components of and/or by adding one or more agentsto) a primary sample. For example, filtering using a semi-permeablemembrane. Such a “processed sample” may comprise, for example nucleicacids or proteins extracted from a sample or obtained by subjecting aprimary sample to techniques such as amplification or reversetranscription of mRNA, isolation and/or purification of certaincomponents, etc.

Subject: As used herein, the term “subject” or “test subject” refers toany organism to which a provided compound or composition is administeredin accordance with the present invention e.g., for experimental,diagnostic, prophylactic, and/or therapeutic purposes. Typical subjectsinclude animals (e.g., mammals such as mice, rats, rabbits, non-humanprimates, and humans; insects; worms; etc.) and plants. In someembodiments, a subject may be suffering from, and/or susceptible to adisease, disorder, and/or condition.

Substantially: As used herein, the term “substantially” refers to thequalitative condition of exhibiting total or near-total extent or degreeof a characteristic or property of interest. One of ordinary skill inthe biological arts will understand that biological and chemicalphenomena rarely, if ever, go to completion and/or proceed tocompleteness or achieve or avoid an absolute result. The term“substantially” is therefore used herein to capture the potential lackof completeness inherent in many biological and/or chemical phenomena.

Suffering from: An individual who is “suffering from” a disease,disorder, and/or condition has been diagnosed with and/or displays oneor more symptoms of a disease, disorder, and/or condition.

Susceptible to: An individual who is “susceptible to” a disease,disorder, and/or condition is one who has a higher risk of developingthe disease, disorder, and/or condition than does a member of thegeneral public. In some embodiments, an individual who is susceptible toa disease, disorder and/or condition may not have been diagnosed withthe disease, disorder, and/or condition. In some embodiments, anindividual who is susceptible to a disease, disorder, and/or conditionmay exhibit symptoms of the disease, disorder, and/or condition. In someembodiments, an individual who is susceptible to a disease, disorder,and/or condition may not exhibit symptoms of the disease, disorder,and/or condition. In some embodiments, an individual who is susceptibleto a disease, disorder, and/or condition will develop the disease,disorder, and/or condition. In some embodiments, an individual who issusceptible to a disease, disorder, and/or condition will not developthe disease, disorder, and/or condition.

Treat: As used herein, the term “treat,” “treatment,” or “treating”refers to any method used to partially or completely alleviate,ameliorate, relieve, inhibit, prevent, delay onset of, reduce severityof, and/or reduce incidence of one or more symptoms or features of adisease, disorder, and/or condition. Treatment may be administered to asubject who does not exhibit signs of a disease, disorder, and/orcondition. In some embodiments, treatment may be administered to asubject who exhibits only early signs of the disease, disorder, and/orcondition, for example for the purpose of decreasing the risk ofdeveloping pathology associated with the disease, disorder, and/orcondition.

Wild-type: As used herein, the term “wild-type” has its art-understoodmeaning that refers to an entity having a structure and/or activity asfound in nature in a “normal” (as contrasted with mutant, diseased,altered, etc.) state or context. Those of ordinary skill in the art willappreciate that wild type genes and polypeptides often exist in multipledifferent forms (e.g., alleles).

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

Those of skill in the art will understand that the drawings, describedbelow, are for illustrative purposes only. The drawings are not intendedto limit the scope of the present teachings in any way.

FIG. 1 depicts exemplary embodiments of known methods for codingmolecular targets.

FIG. 2 depicts exemplary sequential barcoding of provided methods. (a)Schematic of sequential barcoding. In each round of hybridization,multiple probes (e.g., 24) were hybridized on each transcript, imagedand then stripped by DNase I treatment. The same probe sequences couldbe used in different rounds of hybridization, but probes were coupled todifferent fluorophores. (b) Composite four-color FISH Data from 3 roundsof hybridizations on multiple yeast cells. Twelve genes were encoded by2 rounds of hybridization, with the third hybridization using the sameprobes as hybridization 1. The boxed regions were magnified in thebottom right corner of each image. The matching spots were shown andbarcodes were extracted. Spots without co-localization, without theintention to be limited by theory, could be due to nonspecific bindingof probes in the cell as well as mis-hybridization. The number of eachbarcode were quantified to provide the abundances of the correspondingtranscripts in single cells. (c) Exemplary barcodes. mRNA 1:Yellow-Blue-Yellow; mRNA 2: Green-Purple-Green; mRNA 3:Purple-Blue-Purple; and mRNA 4: Blue-Purple-Blue.

FIG. 3A illustrates aspects of an exemplary hybridization experiment.

FIG. 3B depicts an exemplary process for performing a pseudo-color basedbarcoding scheme.

FIG. 4 depicts an overview illustrating a method for performing apseudo-color based barcoding scheme: SPOTs (Sequential Probing OfTargets). (a) RNA molecules are captured on Locked Nucleic Acid (LNA)poly(dT) functionalized coverslips followed by hybridization of 28-32gene specific primary probes. (b) Schematic to generate 12 pseudocolorsused in decoding each RNA species. In each round of serialhybridizations, 3 ‘colors’ are generated by imaging 3 unique readoutprobes conjugated with dye Alexa 647, Alexa 594, and Cy3b. After 4rounds of serial hybridizations, the images are collapsed into 1 imageto generate an image with 12 pseudo-colors. (c) Schematic of decoding 5barcoded rounds based on 12 pseudo-colors coding scheme. (d) Digitizedimage of 12 pseudo-colors switching based on actual experimental image.After decoding 5 barcoded hyb, up to 20736 unique RNA species can bedecoded with high accuracy and low error rates.

FIG. 5 illustrates mRNA transcripts immobilized on a surface throughpoly-A-tail or hydrogel embedment.

FIG. 6 illustrates an exemplary embodiment of gene specific primaryprobes design and sequential barcoding hybridization on mRNA immobilizedon a surface. (a) Design of gene specific primary probe. (b) primaryprobe hybridizations on mRNA on the surface, alternatively, the primaryprobes hybridizations can also be done in solution. (c) sequentialbarcoding hybridization on the surface. One or multiple 18-30merssecondary readout probes conjugated with fluorophore will hybridize tothe primary probes during each round of hybridization.

FIG. 7 illustrates different arrangements of secondary probe bindingsites on primary probes. Assume 24 primary probes are used for eachgene, 4 rounds of barcoding hybridization with 4 different uniquesecondary probe binding sequences, there are various combinations ofarranging the secondary probe binding sequences on the primary probe.(a) all 4 unique binding sequences a, b, c, d are placed on all 24primary probes. (b) each unique binding sequence are placed separatelyinto each primary probe. In this case, 6 primary probes have uniquebinding sequence a, 6 primary probes will get unique binding sequence b,6 primary probes will have unique binding sequence c, and 6 primaryprobes will have unique binding sequence d. (c). a combination of uniquebinding sequences is placed on different primary probes for a gene. Inthis case, the unique binding sequences combination can be(a,b),(a,c),(a,d),(b,c),(b,d),(c,d) on each 4 primary probes.

FIG. 8 depicts possible ways to extinguish fluorescent signals via (a)flowing in high concentration of formamide to ‘melt’ off the secondaryreadout probe or (b) chemical cleavage.

FIG. 9 depicts a schematic illustration of serial hybridization to scaleup the number of available ‘fluorophore’. For example, by using 3fluorophores, and a total of 12 unique secondary readout sequences, 3unique secondary readout probes were serially flown in. The probes codefor hyb1 at a time, image, and then extinguish the signal. Then, another3 unique secondary readout probes were be flowed in and undergo the sameprocess. With a total of 4 rounds of serial hybridizations (n=4), 12unique secondary readouts with 3 colors are used to generate the barcodecolor for hyb1. The same process was repeated for barcoding hyb 2, . . .to hyb N. Essentially, a combination of serial hybridization andsequential barcoding hybridization allows us increase the number offluorophore (F) from 3 to 12, and if number of barcoding hybridization(N)=4, the barcode capacity will be (3×4)⁴=20736 which can cover almostthe entire transcriptome.

FIG. 10 depicts a table showing possible barcodes obtained using thepseudo-color barcode scheme.

FIG. 11 depicts an exemplary embodiment of data analysis.

FIG. 12 depicts an exemplary a raw image from a barcoding experiment.

FIG. 13 depicts exemplary results from data analysis.

FIG. 14 depicts exemplary results from data analysis.

FIG. 15 depicts exemplary results from translational profiling analysis.

FIG. 16 depicts exemplary results from translational profiling analysis.

FIG. 17 illustrates results of gene specific primary probes design. (a)Each primary probe comprises a 25-nt gene specific sequencescomplementary to the mRNA, 4 readout sequences, and 2 primers bindingsites. Each gene is targeted by a minimum of 28 to 32 primary probes.(b) Both priming regions (grey in the probe schematic) used insynthesizing gene specific primary probes are also used as aregistration marker through the hybridization of Alexa 488 conjugatedreadout probes. Majority of the fluorescent spots stay even after 20rounds of hybridizations. (Scale bars: 2 μm.)

FIG. 18 illustrates fluorescent switching through cleavage of disulfideconjugate dye on readout probes is highly efficient (a) 20 rounds ofhybridization are accomplished by extinguishing fluorescent signalsthrough reduction of disulfide conjugated dye to readout probes usingTCEP, followed by re-hybridization of next unique secondary readoutprobes. The amide bond between the Alexa 488 dye (shown in yellow) andprimer readout probes used as a registration marker is not affected byTCEP. (b) The fluorescent signals in each channel after treatment of 50mM of TCEP for 5 minutes at room temperature is reduced to minimal tonone. (Scale bars: 5 μm.)

FIG. 19 illustrates RNA Sequential Probing Of Targets (SPOTs)measurement of 10,212 genes. Correlation of RNA SPOTs with RNAseq infibroblasts with Pearson r coefficient of 0.86. (b Two SPOTs replicateexperiments, signifies its high reproducibility (c) Correlation of RNASPOTs measurement with RNAseq in mESCs, with Pearson r coefficient of0.90. (d) Correlation of averaged smFISH counts for 24 genes in mESCs.RNA SPOTs has single molecule sensitivity and highly accurate intranscriptome profiling.

FIG. 20 illustrates RNA SPOTs analysis at lower depth. (a) Correlationbetween RNA-seq FPKM and RNA SPOTs SPM from another replicate is highwhen a total of 376,781 spots are counted. SPM, spots per million; FPKM,fragments per kilobase per million reads. (b) High reproducibility ofRNA SPOTs between the two replicates in profiling ES-E14 cell geneexpression (n1=376,781 spots, n2=1,688,747 spots).

FIG. 21 shows raw images of 20 rounds of fluorescent switching inchannel 647. Bright dots are the real targets while dim dots are due tononspecific binding. The switching between each round of hybridizationis complete, with minimal retention of fluorescent signals from previousround. (Scale bars: 2 μm.)

FIG. 22 depicts assessment of primary probes non-specific binding. (a)Raw images of 532 channel with the presence of mRNA on coverslipsthrough LNA poly(d)T capturing. (b) No bright fluorescent signals isobserved in the absence of mRNA on coverslips as a control. The leftimage has the same contrast as (a) while the right image contrast hasbeen increased 4.5 fold to illustrate better the non-specificfluorescent signals. (c) Quantitative measurement of fluorescentintensity in channel 647 with and without the presence of mRNA. Athreshold can be set to distinguish between the two populations toidentify the real signals. (d) & (e) same as (c) but for channel 594 andchannel 532.

FIG. 23 depicts smFISH measurement in single cells correlates with RNASPOTs measurement in NIH/3T3 cells. (a) Raw images of the 7 genesmeasured by smFISH in NIH/3T3 cells. (Scale bars: 5 μm.) (b) Theaveraged RNA smFISH counts agrees with RNA SPOTs SPM (spots per million)with a Pearson correlation coefficient of 0.88, indicating RNA SPOTsquantitation is accurate. Error bars represents the standard error ofthe mean (SEM) across different single cells.

FIG. 24 depicts that mRNA can be immobilized by polyacrylamide hydrogelon a bind-silane treated coverslips. (a) mRNA is trapped in the hydrogelmesh once acrylamide and bis-acrylamide monomers crosslink completely onthe coverslip. (b) smFISH detection of ACTB once the total RNA iscaptured on a coverslip through LNA poly(d)T capturing (left) orpolyacrylamide hydrogel (right). Negative control (channel 488) showsthat the fluorescent signals are not coming from nonspecific sources.(Scale bars: 5 μm.)

FIG. 25 depicts exemplary reaction scheme for synthesizing DNA probesconjugated to dye through cleavable disulfide linker.

FIG. 26. An exemplary embodiment illustrating sequential barcoding usinggene specific primary probes, secondary bridge probes and tertiaryreadout probes. (a) Sequential barcoding FISH (seqFISH) with DNA readoutprobes conjugated with dyes through disulfide linkage. The scheme beginswith hybridization of gene specific primary probes, followed bysecondary bridges with readout binding sites, and a unique tertiaryreadout probes with disulfide-linked dye. Once imaged, reducing agentsuch as TCEP/DTT is used to eliminate the fluorescent signals.Subsequent hybridization gives fluorescent signals which is notinterfered by previous fluorescent spots. The secondary bridges can bestripped off by using high concentration of formamide, and replaced by anew set of secondary bridges. (b) An exemplary embodiment illustratingprimary probes with two overhang sequences. One of the alternate designsof gene specific primary probes and secondary bridges. For example, with2 overhangs on primary probes, each overhang can bind 1 secondary bridgewhich consists of 3 unique tertiary readout probes binding sites. Byusing 4 different colors of fluorophore, one can scale up the barcodesto 46=4096 with this design.

FIG. 27A depicts an exemplary hybridization chain reaction (HCR).

FIG. 27B depicts an exemplary readout probe.

FIG. 27C depicts an exemplary hybridization chain reaction based onreadout probes with cleavable linkers.

FIG. 28 depicts an exemplary embodiment illustrating rehybridization inmouse embryonic stem cells (mESCs). (a) First hybridization withtertiary readout probes conjugated with A647. Real fluorescent spots areshown in red dashed box. (b) No fluorescent spots are observed inchannel 594 during the first hybridization. (c) After TCEP treatment andwashing steps, channel 647 is reimaged. The observed dim dots arenon-specific sticking of dyes which does not interfere with subsequentreal fluorescent spots identification. (d) Second unique readout probesare hybridized to the secondary bridge binding sites to give realfluorescent spots that appear as the same positions in (a).

We also capture in vitro transcribed polyA-tailed dCAS9-EGFP mRNA on adT20 Locked Nucleic Acid (LNA) surface-modified coverslips to show therehybridization scheme works on the coverslips.

FIG. 29 depicts an exemplary embodiments illustrating rehybridization onmRNA captured on a dT20 LNA surface-modified coverslips. (a) (Left)First round of hybridization with tertiary probes conjugated with A647.(Right) No fluorescent spots are observed in channel 594 during firsthybridization. (b) (Left) Channel 647 has minimal to no leftoverfluorescent signals. (Right) Fluorescent spots from second round ofhybridization which appear as the same positions as first hybridization.

FIG. 30 illustrates an exemplary process for error correction.

FIG. 31 illustrate an exemplary computer system for implementing theerror correction methods disclosed herein.

FIG. 32 depicts an overview of the Sequential barcode FISH (seqFISH) inbrain slices. A). A coronal section from a mouse brain was mounted on aslide and imaged in all boxed areas. Each image was taken at 60×magnification. B). Example of barcoding hybridizations from one cell infield from A. The same points are re-probed through a sequence of 4hybridizations (numbered). The sequence of colors at a given locationprovides a barcode readout for that mRNA (“barcode composite”). Thesebarcodes are identified through referencing a lookup table abbreviatedin D and quantified to obtain single cell expression. In principle, themaximum number of transcripts that can be identified with this approachscales to FN, where F is the number of fluorophores and N is the numberof hybridizations. Error correction adds another round of hybridization.C). Serial smHCR is an alternative detection method where 5 genes arequantified in each hybridization and repeated N times. Serialhybridization scales as F*N. D). Schematic for multiplexing 125 genes insingle cells. 100 genes are multiplexed in 4 hybridizations by seqFISHbarcoding. This barcode scheme is tolerant to loss of any round ofhybridization in the experiment. 25 genes are serially hybridized 5genes at a time by 5 rounds of hybridization. Each number represents acolor channel in single molecule HCR. As a control, 5 genes are measuredboth by double rounds of smHCR as well as barcoding in the same cell. E.SmHCR amplifies signal from individual mRNAs. After imaging, DNAsestrips the smHCR probes from the mRNA, enabling rehybridization on thesame mRNA (step a). The “color” of an mRNA can be modulated byhybridizing probes that trigger HCR polymers labeled with different dyes(step b). mRNA are amplified following hybridization by adding thecomplementary hairpin pair (step c). The DNAse smHCR cycle is repeatedon the same mRNAs to construct a predefined barcode over time.

FIG. 33 illustrates an example accurate in situ quantification of mRNAlevels generated by seqFISH. A). Image of seqFISH barcoding 100 genes inthe outer layer of the mouse cortex. RNA dots in the image are zprojected over 15 μm. Individual mRNA points are shown across 4hybridizations in the inset images. White squares correspond toidentified barcodes, yellow squares correspond to missing transcripts ina particular hybridization, red squares correspond to spurious falsepositives and are not counted in any barcode measurements. Numbers inthe squares correspond to barcode indices. B). seqFISH correlates withsmHCR counts. After barcoding, 5 target mRNAs were measured twice bysmHCR in the same cells, providing absolute counts of the transcripts.The two techniques correlate with an R=0.85 and a slope (m) of 0.84(n=3851 measurements). The 2D histogram intensity shows the distributionof points around the regression line. A high density of points is seenalong the regression line. The density falls off steeply around theregression line. C). Error correction results in a median gain of 373(25%) counts per cell (n=3497). Red and blue curves correspond to thetotal barcode counts per cell before and after error correction. D).Dropped and off-target barcodes represent a small source of error inseqFISH. 100 on-target barcodes and 525 off-target barcodes are measuredper cell. Dropped barcodes are due to at least two overlapping dotsappearing within the same region. E. Off-target barcodes are rarelyobserved and contribute minimally to the expression profile in singlecells. Each of the 100 on-target barcodes (blue) and 525 off-targetbarcodes (red) are quantified per cell. The mean is shown with shadedregions corresponding to 1 SD (N=41 imaged regions).

FIG. 34 depicts an example illustrating that distinct clusters of cellsexhibit different regional localization in the brain. A). Geneexpression of 14,908 cells presented as a Z-score normalized heatmap.B). Regional compositions of 13 cell clusters are visualized as stackedbar plots with the area corresponding the number of cells in eachregion. Hippocampal regions are: CA3, CA1, Dentate Gyms (DG). Corticalregions: parietal and temporal. Box plot of the Z scores of 21representative genes are plotted for each cell class. The major tickmarks correspond to Z score 0 while every minor tick is a z scoreinterval of 1. Cell type assignments are shown on the dendrogram.Abbreviations: Hippocampus pyramidal (Hipp), cortex (Cort), Dentate Gyms(DG), Interneurons (Int), Astrocyes (Astro), Microglia (μGlia). C).Subclusters of cluster 6 cells and their regional localization and geneexpression profile displayed under the dendrogram. Subcluster 6.1 isenriched in the CA3, while 6.7 is enriched in the DG. D). Subclusters ofcluster 7 cells are shown. Almost all cells are localized in the GCL buthave different combinatorial expression profiles. Note Calb1 expression,which marks out granule cell maturation, differs amongst subclusters.E). Any random subset of 25 genes can recapitulate approximately 50% ofthe information in the correlation amongst cells (red), but a largernumber of genes are required to accurately assign cells to cluster usinga random forest algorithm (blue) (n=10 bootstrap replicates; shading is95% CI), indicating that fine structures in the data requirequantitative measurements of combinatorial expression of many genes. F).Similar to E, while the first ten PCs explain the coarse structure, alarger number of principal components (PCs) are required to describe thefull data. Expected variation (green) and accuracy in predicting cellidentity using a random forest model (blue).

FIG. 35 depicts an example embodiment, illustrating spatial layering ofcell classes in the Dentate Gyms (DG). A-B). Suprapyramidal andinfrapyramidal blades of DG. Cells of the subgranular zone (SGZ) andgranule cell layer (GCL) are arranged in lamina layers in mirrorsymmetric patterns on the upper and lower blades. C). The SGZ stays onthe inner layer of the DG fork. D). Cells are patterned in the crest.Numbered color key corresponds to cluster numbers in FIG. 34b . E).Letters in the cartoon of DG correspond to images. F). 3D image of thefork region shown in C).

FIG. 36 depicts an example embodiment, illustrating that subregions ofthe hippocampus are composed of distinct compositions of cell classesbased on the first 125 gene experiment. Upper right panel. Cartoon ofhippocampus with imaged regions labeled. Color key corresponds to theclasses in FIG. 39b . A-D). These images are regions from the CA1d.Astrocytes (Astro) are marked in image A) and a microglia cell (μGlia)is marked in image B). Moving along the hippocampus from CA1 dorsal toventral, cell classes transition from a homogenous dorsal population (Cto D) to a mixed population in the CA1 intermediate (E-F) to regions ofeven larger cellular diversity in the CA1 ventral region (G-I). Thedotted line in D) marks the transition point of the CA1d to the CA1i. E)shows two laterally segregated cell classes (marked by a dotted line) inthe CA1i along with cholinergic interneurons (Int) on the interiorsurface of the CA1i. The ventral (J-K) and intermediate CA3 (L-M) havesimilar cell classes compositions to the CA1v and CA1i. The two lastregions (O-P) of the dorsal CA3 shows distinct cell classes compositionsthat are relatively homogeneous within a field but are different thanother fields of CA3. The cell class composition of field P is similar tothat of the CA1d, but these cluster 6 cells are grouped into a distinctsubcluster.

FIG. 37 depicts an example embodiment, showing mapping of cell types toa second brain slice with 125 genes. Upper right panel. Cartoon ofhippocampus with imaged regions labeled. Color key corresponds to theclasses in FIG. 34b . A-D. Similar to the cell class compositions shownfor the hippocampus in FIG. 36, CA1d in this second coronal section froma second mouse is composed of mostly cluster 6 cells. (E) CA1i regionand (F-G) the CA1 ventral regions are again composed of similar cellclasses to that shown in FIG. 36 with increasing diversity of cell classcompositions from the CA1d to the CA1i to finally the CA1v. (H-J) CA3regions. (K-M) DG regions showing the same cell classes and layerpattern of the GCL and SGZ shown in FIG. 35.

FIG. 38 depicts an example embodiments, showing mapping of cell types toa third brain slice with 249 genes. Upper right panel. Cartoon ofhippocampus with imaged regions labeled. Color key corresponds to theclasses in FIG. 46C. A-C). Similar to the slice shown in FIGS. 36 and37, CA1d is relatively homogenous in cell cluster composition. D-G).Images from the CA1i region show that the cell class composition isdifferent from that of the CA1d. H-K). Again, similar to FIGS. 36 and37, images from the CA1 ventral regions shows a much more complicatedcellular composition and a high degree of cellular heterogeneity. L-R).Images from the CA3 region show that the cellular compositions alsocreates 3-4 subregions within the CA3. The cellular heterogeneity of theCA3 subregions mirrors that of the CA1, where the ventral region of theCA3 is very heterogenous while the dorsal region of the CA3 isrelatively homogenous. S-T). The DG regions show the distinct SGZ versusGCL layering pattern seen in the previous two brains.

FIG. 39 depicts an example embodiment, showing correlations of thetranscription profile across the pyramidal layer A). mRNA counts in thecell bodies in the Stratum Pyramidale (SP) are grouped within each fieldof view. A single cell in the Stratum Radiatum (SR) is shown toillustrate individual mRNA localization. Stratum Oriens (SO) is labeledfor orientation. B). mRNAs in different subregions of pyramidal layershow both long-distance spatial correlations as well as localcorrelations between neighboring fields. Both CA1 and Dentate Gyms (DG)show high regional correlations. Correlation is calculated based on the125 gene experiment. C). Illustration of regional and long distancecorrelation patterns observed in B. Correlated regions are colored andlong distance correlations are shown as dotted lines with their mediancorrelation coefficient written over the dotted line.

FIG. 40 depicts an example embodiment, showing barcode assignments forall genes in the combined hybridization experiment (FIG. 32). Barcodeassignments in the 125-gene seqFISH and serial experiment (FIG. 32). 125genes are profiled, 100 of which are barcoded and 25 are identified byserial smHCR hybridizations. Five control genes (Hdx, Vps13c, Zfp715,Fbll1, Slc4a8) were quantified by both techniques. The smHCR round ofhybridization of control genes were performed twice to co-localizesignal to obtain an absolute count.

FIG. 41 depicts an example embodiment, showing smHCR performance metricsas compared to smFISH, (related to FIG. 32). A). Raw data of Pgk1transcripts imaged in a brain slice. The transcript was targeted with 2her probes sets and 1 smFISH probe set, each consisted of 24oligonucleotide probes. The probe sets were hybridized together and wereimaged in 3 different channels. Green circles are transcripts detectedin all channels, yellow circles signify transcripts detected in 2 out of3 channels, and red circles represent signal found in only 1 channel(false positives due to nonspecific binding). These images show thatsmHCR and smFISH have similar sensitivity, specificity, and spot size.B). Gain of smHCR vs smFISH. The mean gain of smHCR is 22.1±11.55 vssmFISH (n=1338). C). True positive detection rate of smHCR and smFISHper channel. The percent of true positives (transcripts detected with atleast 2 out of 3 probe sets) detected with each probe set (n=1338). D).False positive rate of smHCR and smFISH. Percent of total dots in achannel not detected in any other channel for 3 color Pgk1 (n=1338). E).All the regions imaged in the coronal section are boxed. Each boxrepresents a field of 216 um×216 um. The brain section used for FIGS. 32and 33 is shown on the left. The middle section is used for FIG. 34 andthe right section is used for FIG. 38.

FIG. 42 depicts an example embodiment, showing quantitation of seqFISH(related to FIG. 33). A). All control genes show high correlationsbetween seqFISH and smHCR. B). Number of dropped hybridizations from thebarcode. Blue bars represent measured probability and the red barsrepresent inferred values from binomial distribution fitting of measuredprobability. The ratio of the full barcodes (4 hybridizations) vs 3hybridization barcodes indicate that transcripts that are mis-hybridizedin 2 rounds are rare. Transcripts missed in 2 or more hybridizations(red bars) could not be recovered from the error-correction algorithmand would be dropped from our quantifications (N=2,115,477 totalbarcodes). C). Intensity of barcode hybridizations overtime. All dotsbelonging to barcodes are quantified in each hybridization and theirmean intensity is plotted over time normalized to the firsthybridization. 99% CI ratio of mean is plotted as a bar over points, butis not visible due to its small size (n=60143 to 111284 points perchannel). D). Barcoding confidence ratio. Barcode classes in D) arecompared to a null model of barcode observations where random chanceobservation should give a ratio of 1. Off target barcodes are observed0.005 times less than expected, suggesting that seqFISH has highaccuracy in correctly counting barcoded transcripts (n=3493 cells). Darkbars on top of bar plots correspond to 99.999% confidence intervaldetermined by bootstrap resampling. E). Comparison of average copynumbers per gene as measured by Zeisel et al. and seqFISH. Single cellRNA-seq underestimates copy numbers compared to seqFISH.

FIG. 43 depicts an example embodiment, showing gene expression patternsand clustering of the 125-gene dataset (related to FIG. 34). A).Overview of 125 gene expression. Plots show the distribution of eachtranscript in all 14,908 imaged cells. Note the last 25 genes havehigher expression and were imaged with serial hybridization. B). Violinplots of Z-score distribution for 125 genes. C). Sub-cluster hierarchyof each of the 13 clusters identified in FIG. 34B. D). PCA eigenvalueanalysis of the cell-to-cell correlation matrix. First 125 PC and theireigenvalues are shown. As observed in FIG. 31, the first 10 PCs explain59.5% of the variation in the data, while the remaining 115 PCs areneeded to explain remaining data. Reflecting this, the eigenvalues ofthe first 10 components are high, while the remaining eigenvalues areuniform. E). Correlation between gene expression and spatiallocalization. Each dot represents a pair of cell classes and theircorrelations in gene expression space (x) and spatial localizationpatterns (y) (N=153 pairwise correlations between classes, R=0.67).Classes that are similar in expression have similar localizationpatterns. F). PCA decomposition separates cells into coherent clusterscorresponding to cell classes. Cells are colored according to theclusters displayed in the dendrogram.

FIG. 44 depicts an example embodiment, showing robustness of cellclasses to downsampling of cells (related to FIG. 34). To measure howwell cluster assignments perform with a limited number of cells, arandom forest model was trained on the cell-to-cell correlation matrixof the 6872 cells in the center field of view. The robustness of theclusters was calculated by applying this model to classify the remainingcells and determining the percent accuracy of correct assignment to theclusters presented in FIG. 34b . While some classes can be assignedaccurately even with a small number of cells as the initial trainingset, several classes require large number of cells to accurately assign(n=10 bootstrap replicates, S.E.)

FIG. 45 depicts an example embodiment, showing cell-to-cell correlationanalysis as a function of dropping genes (related to FIG. 34). A).Clustered gene to gene correlation map for all 125 genes. There are manyblocks of highly correlated genes. A few genes do not fall into anyblocks. B). The full cell-to-cell correlation map using all genes in thedata set. C). Representative cell-to-cell correlation with the indicatednumber of genes used to construct the matrix indicated above each plot.Dropping genes from the data results in degradation of the finestructure of the correlation map.

FIG. 46 depicts an example embodiment, showing gene expression patternsand clustering of the 249-gene dataset (related to FIG. 38). A).Overview of 249-gene expression. Plots show the distribution of eachtranscript in all 2050 imaged cells in the hippocampus. Note the last 35genes have higher expression and were imaged with serial hybridization.B). Violin plots of Z-score distribution for 249 genes. C). Dendogramwith regional localization of the 18 cell clusters for the 249-geneexperiment. D). Correlation of seqFISH counts to smHCR counts for the249-gene experiment. The 2D density histogram shows a high density ofpoints around the regression line that fall off towards the edges of thedistribution. E). Cell-to-cell correlation for all 2050 cells in the249-gene dataset. F). Heat map of the percentage of each cell class ineach region of the hippocampus for both the 125-gene experiments. Theseheat maps show that in both 125-gene experiments the same cell classesare used in roughly the same proportions in each subregion. G). Heat mapof the percentage of each cell class in each region of the hippocampusfor the 249-gene experiment. The same patterns are seen as the 125 geneexperiment (i.e., different regions use different cell classes invarying amounts).

FIG. 47 depicts an example embodiment, showing marker genes expressionin the hippocampus (related to FIG. 38). A). The top panel outlines theregion of the hippocampus being shown in a yellow box. The images showthe raw gene expression patterns seen using smHCR in our data at thedorsal most tip of the CA3 for a representative set of cell identitymarkers used in the 249 gene experiment. The transcript expressionprofile is shown in red, Nissl staining is shown in green, and DAPIstaining is shown in blue. Each image shown is the full field of viewand a maximum intensity projection over 15 um. B). Set of images showingthe distinction between the GCL and SGZ. The GCL shows a high level ofNissl staining and expression of neuronal genes such as slc17a7 andcamkII. The SGZ shows an absence of Nissl staining and terminal neuronmarker genes. The transcript expression profile is shown in red, Nisslstaining is shown in green, and DAPI staining is shown in blue. Eachimage shown is the full field of view (216 um×216 um) and a maximumintensity projection over 15 um.

FIG. 48 depicts an example embodiment, showing comparison of SeqFISHexpression data to Allen Brain Atlas expression data (related to FIG.39). A). ISH data from the Allen Brain Atlas for genes seen to beenriched in the SGZ in the 125 and 249 gene seqFISH experiments. In the125 gene experiment, mertk and mfge8 were found to be enriched in theSGZ. In the 249 gene experiment, nfia and sox11 were seen to be enrichedin the SGZ. ABA ISH data shows similar patterns to those observed withseqFISH for the SGZ. B-C). Comparison of averaged z-score values percell from seqFISH to ABA data across hippocampus. B). Amigo2 Z-scoreprofile found across the different fields of the hippocampus usingseqFISH is shown on top and the ABA ISH image for Amigo2 is shown on thebottom. C). Gpc4 Z-score profile found across the different fields ofthe hippocampus using seqFISH is shown on top and ABA ISH image for Gpc4is shown on the bottom.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

Among other things, the present invention provides new methods,compositions and/or kits for profiling nucleic acids (e.g., transcriptsand/or DNA loci) in cells.

In some embodiments, the present invention provides methods forprofiling nucleic acids (e.g., transcripts and/or DNA loci) in cells. Insome embodiments, provide methods profile multiple targets in singlecells. Provided methods can, among other things, profile a large numberof targets (transcripts, DNA loci or combinations thereof), with alimited number of detectable labels through sequential barcoding.

Pseudo-Color Based Barcoding

In one aspect, disclosed herein are pseudo-color based barcoding method.For example, pre-designed barcodes are associated specific moleculartargets through sequential hybridization experiments. A pseudo-colorbased barcoding scheme is developed to overcome limitations in theprevious generation of the technology such as lack of visual signalsthat can be associated with the probes or small internal within cellwhen carrying out in situ experiments. The current method can be appliedto both in vitro and in situ analysis. According to the method, eachbarcoding round comprises multiple serial hybridizations where a smallnumber of colored signals (that are associated with probes) are used ineach hybridization experiment within a serial hybridization round.Images from each serial hybridization experiment within the same serialhybridization round are combined to form a composite image for eachbarcoding round. In each barcoding round, the same set of moleculartargets are analyzed. After all barcoding rounds are completed,associated of the barcode with these molecular targets is completed.

To distinguish from existing FISH methods, the currently pseudo-colorbased barcoding methods is referred to as Sequential Probing Of Targets(“SPOTs”).

SPOTs offers numerous advantageous over existing barcoding methods. Forexample, it does not require a user to have a large number of detectablelabel molecules, which can save time and money.

Pseudo-color scheme can overcome this density problem in both in situand in vitro applications. In the in vitro cases, implemented withSPOTS, capturing transcripts onto an oligonucleotide dT surface andadjusting the dilution factors can easily remove the optical crowdingproblems and allow the transcriptome to be imaged by seqFISH.

In addition, pseudo-color is more efficient in terms of imaging timethan all existing imaging methods including expansion microscopy. Forexample, with 3 fluorophores, it takes 3{circumflex over( )}(10−1)=19,683 to code for the transcriptome with one round of errorcorrection. Thus, a total of 30 frames of imaging is required. In thepseudo color scheme, 20 pseudo-colors can be used for 4 rounds ofhybridization to code for 20{circumflex over ( )}(4−1)=8000 genes ineach of the three fluorophore channels for a total of 24,000 genes. Thisrequires 3×20×4=240 frames to image, a 8 fold increase in the imagingtime. However, because the density of mRNA is effectively diluted into3×20=60 pseudo-channels instead of 3 fluorophore without pseudo-colorcoding, the density problem is alleviated with a factor of 20. Thetarget spots can be localized to nanometer precision by Gaussian fittingto decrease the density in each pseudo-color channel before barcodealignment. Thus, the benefit of pseudo-color coding is to decrease thedensity of the target spots in the cell, while saving imaging timecompared to expansion microscopy, where expanding the sample by 20 foldsrequires an additional 20 fold increase in imaging time. Since imagingtime is rate limiting in any sequential imaging method, pseudo-colorsolves a major problem in implementing transcriptome profiling in situ.

Overview

In fluorescence in situ hybridization (FISH) methods, fluorescencelabels that are capable of producing different visual signals are usedto detect multiple molecular targets (such as mRNA transcripts) at thesame time in one hybridization experiment.

FIG. 1 illustrates three types of hybridization methods, wherecombinations of visual signals (barcoding schemes) are used tospecifically encode individual molecular targets. FIG. 1a ) shows aspatial barcode scheme where different colored probes are bound to thesame target. The probes of the same color are grouped and spatiallyseparated from another color.

FIG. 1b ) depicts a spectral barcode scheme where different coloredprobes are no longer separated in groups according to their respectivecolors. Probes bound along a target are resolved by spectral methods tocreate a barcode.

The approaches in FIGS. 1a ) and 1 b) both rely on combinations ofsignals (colors) and are heavily limited by the types of signals thatare available and the resolution at which microscopic instruments canresolve such signals.

As noted in FIG. 1, when 5 different color dyes are associated withprobes to identify different molecular targets, the spatial barcodingscheme in FIG. 1a ) of FIG. 1 will lead a barcoding capacity of 720;i.e., up to 720 molecular targets can be uniquely coded. Using theapproach of FIG. 1b ), only a maximum of 31 molecular targets can beuniquely coded.

The approach in FIG. 1c ), known seqFISH, greatly expanded codingcapacity. Here, instead of cramming all coding signals into a singlehybridization experiment, barcodes are formed via temporally separatelysequential hybridization experiments. The color of probes binding to atarget can remain the same or change to a different color in differentrounds. The barcode for a target is determined by the order by whicheach color appears during sequential rounds of hybridization. Details ofthis barcoding scheme are depicted in FIG. 2 and can be found inInternational Patent Publication No. WO 2014/182528, which herebyincorporate by reference in its entirely.

This method involves lysing cells and immobilized the mRNA on a surfacefor single molecule quantification. However, this technology is notlimited to profiling only the mRNA species, but also applicable to othermolecules such as DNA and proteins.

Since mRNAs are immobilized on a surface, they can be barcoded byintroducing several rounds of hybridizations, imaging the fluorescentspots, and extinguishing the signal for the next round of hybridization.The number of barcodes available scales as FN where F is the number offluorophores and N is the number of barcoding hybridization rounds. Forexample, with 3 fluorophores and 9 rounds of hybridizations, theavailable barcodes are 3⁹=19683. This method achieves single moleculesensitivity which allows the determination of gene expression throughcounting each identified mRNA molecules on the surface.

As shown in FIG. 2a ), in each round of hybridization, multiple probes(e.g., 20-30) were hybridized to each transcript, imaged and thenstripped by DNase I treatment before another set of probes are used inthe next round of hybridization. Probes with different binding sequencescan be used in different rounds of hybridization. Also, the same probesequences could be used in different rounds of hybridization, but probeswere coupled to different fluorophores. A barcode for this mRNA isformed by combining the fluorescent colors associated with the probes ineach round of hybridization according to the order or sequence by whichthe colors are detected. In the example illustrated in FIG. 2a , thebarcode for the mRNA in N rounds of hybridization will be {purple-blue-. . . -green}, with each color representing a particular hybridizationround.

FIG. 2b ) illustrates exemplary composite four-color FISH images from 3rounds of hybridizations on multiple yeast cells. As illustration, thesame small region is magnified to reveal colored spots representingbinding to probes to mRNA transcripts. Colored spots corresponding tothe same target from different rounds of hybridization are matched torender a color barcode for the target. In each round of hybridization,the same spots were detected, but the dye associated with the transcriptcan change. The identity of an mRNA is encoded in the temporal sequenceof dyes hybridized. For example, three rounds of hybridizations createdthe barcodes in FIG. 1c ): mRNA 1: Yellow-Blue-Yellow; mRNA 2:Green-Purple-Green; mRNA 3: Purple-Blue-Purple; and mRNA 4:Blue-Purple-Blue.

As noted above, the method of FIGS. 1c and 2 has significantly increasedcoding capacity. Based on 5 different visual signals and 7 rounds ofhybridization experiments, a total of 78,125 targets (e.g., genes ormRNA transcripts) can be unique coded.

However, the types of color signals that can be used in these sequentialhybridization experiments are limited. In order to encode the humangenome, which includes about 20,000 protein encoding genes, at least 7rounds of hybridization experiments are required. This many number ofhybridization rounds result in long and complex barcodes that are hardto distinguish from each other, thus resulting in errors andinaccuracies. If one would like to include error correction in thebarcodes, at least 8 rounds of hybridization experiments are needed,leading to even more complex barcodes and likelihood of more errors andinaccuracies.

Efforts made for creating more types of visual signals have not led tosignificant improvement. More improvements are need to increase codingcapacities, enhance detection accuracies and sensitivities, and reduceerrors.

In one aspect, a novel barcoding scheme is provided. Here, instead ofdeveloping different colors that are suitable for probe design, acounter-intuitive approach is adopted to create pseudo-colors to expandcapacity of barcoding. Although the term “pseudo-color” is used, one isnot limited to using colors in this new coding scheme, symbols, letters,numbers, 2D barcodes, 3D barcodes, and combinations thereof can be usedto uniquely identify molecular targets. As disclosed herein the numberof pseudo-colors far exceeds the number of actual colors that areassociated with the detection probes used in the hybridizationexperiments.

FIG. 3A illustrates exemplary aspects that may contribute to errorcorrection during a sequential hybridization process. Such aspectsinclude but not limited to sample processing 302, probe design, barcodedesign, hybridization, image collection, signal removal andre-hybridization, and data analysis. In practice, any or all of theseaspect can contribute to the quality of analysis.

Barcoding by sequential hybridization includes multiple rounds ofhybridization. Each round of hybridization in turn is a multiple stepprocess including most or all of the aspects outline above. Errors andinaccuracies can be introduced at any step during any round ofhybridization. Such errors can lead to misidentification of target genesin a sample.

Prior to hybridization, samples that will be subject to analysis areprocessed. The main purpose of such processing is to immobilize targetmolecules; for example, mRNAs, chromosomal DNAs, and proteins. It isessential that the target molecules remain spatially fixed throughdifferent rounds of hybridization.

Probe design contributes to specificity of binding between the probesand target sequences. It is possible to apply hybridization chainreaction to allow multiple probes to bind at the same target sequence toamplify detectable signals. Additionally, as illustrated in FIGS. 26 and27, it is possible to insert a cleavable linker between the bindingsequence (that binds a target sequence) and signal moiety (that emitsvisible signals) of a probe. Here, error can be reduced because noremoval of probes is needed for the next round of hybridization.Instead, only visible signals are switched.

Barcodes implemented during the analysis are unique. Nonspecific bindingor other mistakes can render the results from one or more rounds ofhybridization unreliable. A simple solution is to remove data that areunreliable. However, if data from one or more rounds of hybridizationare eliminated from analysis, some of the barcodes would becomeindistinguishable from each other.

During and after hybridization of probes to target sequences, there arealso aspects that are important for improving the quality of thesequential analysis. For example, the hybridization conditions should bedesigned to avoid non-specific bindings. This can also be achievedthrough sample processing and probe design. Similarly, image collectioncan also be affected by a number of factors including sample processing,probe design and barcode design. As described above, probes with toomany types of color signal in many rounds of hybridization can lead tobarcodes that are hard to resolve or even errors.

Between hybridization rounds, old probes can be removed before newprobes are added. Here, the removal process can also be associated witherrors. For example, if the removal condition is too harsh, immobilizedbiological samples can be disturbed. As a result, positions of visualsignals in images from different rounds of hybridization experimentswould change.

Some of the errors may be corrected or reduced by data analysis. Forexample, in most scenario, bindings between probes and target sequencesare observed as colored bright spots over relatively darker backgroundwhere no binding is observed. The spots are brightest in the center andfade away at the edges. Gaussian distribution analysis can be used tofocus on the most significant portion of an image thus leading to betterresolved image data. In addition, noise reduction can be used to reducebackground signals.

FIG. 3B depicts an exemplary process for performing a pseudo-color basedbarcoding scheme. Exemplary embodiment 300 provides an example of apseudo-color based barcoding process. Here, instead of using a largenumber of actual detectable color signals, which are not available orimpractical to obtain, a small number of color probes are repeatedlyused in multiple serial hybridization experiments where non-overlappingsets of molecular targets are analyzed. Images from the serialhybridization experiment are combined to form a composite imagerepresenting a single barcoding round. In the composite image, at leastsome of the would-be redundant colors are replaced with predefinedsymbols in a pseudo-color based scheme. The process is repeated multipletimes to produce the final multi-component barcodes.

At step 302 liquid molecular targets in a biological sample that needsto be barcoded will be identified. In particular, the number of themolecular targets and their respective identity, such as the name of thetarget and sequence information of the molecular target, will beidentified to provide information that will be used for specific bindingprobe design.

Prior to actual hybridization experiments, the molecular targets in thebiological sample will be immobilized; for example, on a glass coverslip. As disclosed herein, exemplary biological sample includes but isnot limited to a tissue sample, a cell sample, a cell extract sample, anucleic acid sample, a RNA transcript sample, a protein sample, or anmRNA sample.

At step 304, depending on the total number of the biological moleculartargets, the number of barcoding rounds will be decided. Because thenumber of barcoding rounds corresponds to the size of the barcode thatwill result, the user should decide on how many rounds will be neededdepending on the sample size of the molecular targets. In someembodiments, a user may take into consideration the types of colorprobes that are available. In some embodiment, three or more barcodingrounds will be used. In some embodiments, four or more barcoding roundswill be used. In some embodiments, five or more barcoding rounds will beused. In some embodiments, six or more barcoding rounds will be used. Asdisclosed air in one advantage of The method disclosed herein isproviding relatively simpler barcodes that are more error resistant. Ingeneral, a five component barcode is more preferred then an eightcomponent barcode.

At step 306, the number of symbols in a pseudo-color scheme will bedetermined based on the total number of the molecular targets (e.g.,N>>2) and the intended for number of barcoding rounds (e.g., n≥2). Asdisclosed here in the minimum number of pseudo-color symbols (e.g., S)can be determined according to the following equation:

$\begin{matrix}{{S \geq \sqrt[n]{N}},{{where}\mspace{14mu} S},{n\mspace{14mu} {and}\mspace{14mu} N\mspace{14mu} {are}\mspace{14mu} {all}\mspace{14mu} {{integers}.}}} & (1)\end{matrix}$

In some embodiments, one or more error correction coding rounds (e.g., xrounds where x≥1) will be implemented. As such, one may choose a highernumber of pseudo-color symbols to avoid too many barcoding rounds. Theminimum number of pseudo-color symbols, when one or more error reactionrounds, can be determined according to the following equation:

$\begin{matrix}{{S \geq \sqrt[{n - x}]{N}},{{where}\mspace{14mu} S},n,x,{{and}\mspace{14mu} N\mspace{14mu} {are}\mspace{14mu} {all}\mspace{14mu} {{integers}.}}} & (2)\end{matrix}$

Although the term pseudo-color is used, the actual codes used inbarcodes are not limited to colors. Any symbol or combination of symbolscan be used as codes in a barcode so long as such symbol or combinationof symbols are unique. Exemplary symbol or combination of symbolsinclude but are not limited to colors, numbers, letters, shapes, orcombinations thereof. It would be understood that a color based schemewould be preferred because

At step 308, once the total number of symbols to be used in apseudo-color barcoding scheme and the number of barcoding rounds aredetermined, non-redundant and unique barcodes will be created. Forexample, in a n-component barcode using S pseudo-color symbols, abarcode can be expressed as: {B₁, B₂, . . . , B_(n)}, where each of B₁,B₂ through B_(n) is selected from the S pseudo-color symbols. Standardcode design algorithms can be used such that the results barcodes areunique.

During hybridization analysis, barcodes can be shortened due to loss ofdata from one or more rounds of hybridization, For example, after theloss of one round of hybridization, n-component barcodes can becomen−1-component barcodes.

When error correction algorithm is not implemented during probe design,the resulting n−1-component barcodes may now include redundant codes,making it difficult or impossible to decode targets based on the codingscheme useless. In a more specific example using colors as symbols in apseudo-color coding scheme, two distinct six-component barcodes:{Red-Blue-Green-Red-Yellow-Blue} and {Red-Blue-Green-Green-Yellow-Blue}can become {Red-Blue-Green-Yellow-Blue} and {Red-Blue-Green-Yellow-Blue}if data from the fourth hybridization round becomes unavailable. As aresult, the results from the entire set of experiments will be discard,wasting time and resource.

The current disclosure also creates barcodes that are drop orerror-resistant. In some embodiments, one or more rounds of errorcorrection can be implemented. Additional code design algorithms can beapplied such that the resulting barcodes are error resistant; forexample, the loss of any one or more rounds of hybridization data wouldstill lead to shortened barcodes. However, the shortened barcodes arestill unique and can be used to specifically identify a moleculartargets.

In some embodiments, implementing an error correction mechanism involvesremoving barcodes that are vulnerable to loss of data. As such, thenumber of targets that can detected using the error-corrected schemewill be lower than that of the original scheme without error correction.To detect the same number of genes, more hybridization rounds wouldgenerally be needed.

The current method is advantageous in that it does not requireadditional hybridization rounds. Instead, coding capacity of the samenumber of hybridization rounds can be increased by designating morepseudo-color symbols.

At step 308, barcodes for an entire set of hybridization experimentswill be created each corresponding to a particular molecular target. Forexample, a set of hybridization experiments including includes 4barcoding rounds, each of which in turn including includes 4 serialhybridization experiments. In each hybridization experiment, 3 differentdetectable signals (such as three different fluorophores) will beassociated with probes that bind to a set of molecular targets. Here,each detectable signal in the same serial hybridization experiment canbe associated with multiple molecular targets; for example, hundreds oreven more molecular targets. For example, 100 or more molecular targetscan be assigned barcodes that start with the same pseudo-color symbol;i.e., all the molecular targets can be labeled with the same detectablesignal. The unique identity of each molecular target will be reflectedby symbols used in the subsequent three positions in the barcodes. Tocomplete the barcoding in the example above will require that eachtarget 3 molecular target will have a total of 48 hybridizationexperiments in four barcoding runs (see, FIG. 10). These 48hybridization experiments can detect a total of 20,736 molecular target(12⁴) based on 12 pseudo-colors. If one error correction round isimplemented, only up to 1728 (12³) molecular targets can be detected. If1728 different mRNA transcripts are to be detected with errorcorrection, 1728 unique barcodes with embedded error-correct mechanismfor one round of data loss will be created. Actual hybridizationexperiments will be carried out based on these predesigned barcodes.

At step 310, after unique barcodes are generated, each identifiedmolecular target will be assigned one of the unique barcodes. Forexample, for a five component barcode based on 12 pseudo-color symbols,the barcodes will be generated using the 12 symbols. The first code canbe any one of the 12 symbols, but each subsequent symbol and eachsubsequent barcode will be generated taking into consideration thesymbols and barcodes that have already been used in order to avoidredundant barcodes. In some embodiment, additional error correctionalgorithms are implemented in barcode design such that the resultingbarcodes are resistant to mistakes such as the loss of data from anentire barcoding round.

In some embodiments, probes can be synthesized according to thepredetermined probe designs. Each probe can include a binding sequencethat specifically target a site in an intended molecular target. Foreach molecular targets, multiple probes with different binding sequencestargeting different sites of the same molecular target can besynthesized. When probes with different binding sequences targeting thesame molecular target are used in one hybridization rounds, the probeswill be associated with the same detectable signal.

In some embodiments, a probe can include one or more readout sequencesthat can be connected, directly or indirectly, to either side of thebinding sequence. In some embodiments, one or more readout sequences aredirectly connected to the binding sequence as one or more overhangs. Insome embodiments, one or more readout sequences are indirectly connectedto the binding sequence of the probe via one or more intermediatemolecules; for example, as one or more overhangs of a bridge probe.Exemplary intermediate molecules include but are not limited to an RNAbridge probe, a DNA bridge probe, a protein bridge probe, a probe forhybridization chain reaction (HCR), a hairpin nucleic acid probe, an HCRinitiator, an HCR polymer, other known amplification methods, orcombinations thereof.

In some embodiments, the number of different readout sequences is thesame as the number of barcoding rounds. In some embodiments, the numberof readout sequences is greater than a number of barcoding rounds. Insome and body months, each type of readout sequence, when activated, iscapable of generating one color signal. In some embodiments, differentreadout sequence, when activated, are capable of generating the samecolor signal. Any detectable signals suitable for hybridizationexperiments can be used.

For five barcoding rounds, five groups of probes can be designed andsynthesized for each molecular target. Probes in each group wouldincrease food binding sequences that target specific sequences within amolecular target. In some embodiments, probes in the same group willbind to the same target sequence. In other and more preferredembodiments, probes in the same group will bind to multiple targetsequences. As disclosed herein, each barcoding around 1 to 100 probescan be used targeting a specific molecular target. In some embodiments,100 or fewer, 80 or fewer, 60 or fewer, 50 or fewer, 40 or fewer, 30 orfewer, 20 or fewer, or 10 or fewer probes can be used. In someembodiments, between 5 and 40 probes are used.

In some embodiments, these probes will bind to two or more targetsequences, three or more target sequences, four or more targetsequences, five or more target sequences, or six or more targetsequences in the same molecular target. And some embodiments, the probeswill be concentrated in one area of the molecular target. In someembodiments the probes will be spread out along the entire length of themolecular target.

At step 310, all probes required to carry out the entire set ofhybridization experiments will be synthesized, each associated viadirect or indirect connection to a primary binding sequence. Forexample, a set of hybridization experiments including five barcodinground each including 12 serial hybridizations will have a total of 60hybridization experiments. After step 310, all probes required for eachhybridization will be designed and synthesized.

At step 312, serial hybridization experiments can be performed usingprobes synthesized in the previous step. Barcodes associated with theseprobes are stored in a database or library. Each probe will beassociated with a code in the designated barcode for the moleculartarget of the probe. The number of hybridization experiments within aserial hybridization round will be determined based on the availabilityof different types of detectable signals (such as colors) and the totalnumber of symbols in the pseudo-color barcoding scheme. As an example,hybridization experiments based on a 60 pseudo-color scheme will becarried out s using 4 different types of color signals. Then 5 rounds ofbarcoding with the 60 pseudo-color will be carried out for 60⁵=7.776×10⁸number of barcodes. In some embodiments, in each barcoding round, 15serial hybridization rounds will be performed. In each of the 15hybridization round, all four color signals are used to each identify adifferent molecular target. Here, after the particular round of serialhybridization experiments, 60 unique molecular targets will be labeledwith color signals. In some embodiments, 20 serials hybridizationarounds will be performed using three of the four color signals, eachcolor representing a different molecular target. Again, after theparticular round of serial hybridization experiments, each 60 uniquemolecular targets are decoded and associated with a previously assignedpseudo-color symbol. In some embodiments, any combinations of threecolors can be used in the serial hybridization experiments. In someembodiments, in each hybridization round, two to four color signals canbe used, each representing each representing a different moleculartarget. In some embodiments, one color signals can be used in one ormore hybridization experiments in a serial hybridization round. Here,one color must be present in separate hybridization experiments in orderto represent different molecular targets. For example, if one is limitedonly one type of color signal, hybridization experiment can be carriedout 60 times in each barcoding round. Each hybridization experimentidentifies only one molecular target. It would be understood thatperforming a high number of hybridization experiments may not be desiredbecause the biological sample being analyzed can be degraded, forexample, by enzymes that digest nucleic acids.

In some embodiments, during each hybridization experiment in each serialhybridization round, reference visual signals are introduced. Thesereference visual signals are associated with one or more alignmentreferences that are associated with a biological sample and do notchange their positions between hybridization rounds. These alignmentreferences can serve as standard for subsequent image analysis. They canbe part of the biological sample or external material added to andimmobilized with the biological sample. In some embodiments, alignmentreferences are immobilized at the same time as the biological sample. Insome embodiments, alignment references are immobilized at a differenttime from when the biological sample is immobilized. Exemplary alignmentreferences include but are not limited to beads, oligonucleotidesequences immobilized on the coverslips and detected by a complementaryoligo, microscopic objects (e.g., a metal bead, a gold bead, apolystyrene bead loaded with a dye), PCR handle sequences on the primaryprobe, or combinations thereof. As disclosed herein, the referencesignals are fiducial markers for alignment between the different serialand barcoding hybridizations. They are present throughout all of thehybridizations. In some embodiments, at least one marker is present perimage. They can be probes targeting all of the primary probes (as shownin FIG. 18a , yellow probes), such that all the dots are observed in allof the hybridizations. In some embodiments, the reference signals can beproduced from beads attached to the cover glass, or other molecules orparticles. In some applications, where “super-resolution” capabilitiesare used, signals form the beads can be Gaussian fitted to nanometerresolution, and images between hybridizations can be aligned to thatprecision. In some embodiments, this becomes useful for the in situapplication where Gaussian fitting allow more molecular targets to bedetected in each pseudo-color and then collapsed into the compositeimage for barcoding.

At step 314, an image of visible probes bound to the biological samplewill be taken after each hybridization experiment in a serialhybridization round. All images from hybridization experiment in aserial hybridization round will be aligned using reference visualsignals to create a composite image.

At step 316, it will be determined whether there are one or morebarcoding rounds remaining based on the barcodes pre-designed at step308 If yes, the method returns to steps 312 and 314 and more serialhybridization experiments are carried out to constitute anotherbarcoding round and produce another composite image.

If no, the hybridization portion of the analysis is complete and themethod proceeds to step 318. Images and composite images will be subjectto further data analysis. For example, in a sample with limited space,e.g., in a single cell, Gaussian localization analysis can be carried togenerate, for example, more focused color spots each representingbinding to a molecular target. Other data processing methods can also beused to enhance data quality, for example, a de-noise mechanism can beapplied to reduce back signals.

The method concludes at step 320.

FIG. 4 depicts a schematic overview of Sequential Probing Of Targets(SPOTs), targeting RNA samples using a pseudo-color barcode scheme. Themethod performs transcriptome level profiling of mRNAs with singlemolecule sensitivity and high accuracy using a method based onsequential FISH (seqFISH) [Lubeck 2014]. The initial demonstration ofseqFISH showed promising coding capacity to barcode a large number ofmolecular species in cells and tissues [Shah 2016]; however, a majorlimitation has been identified: when seqFISH is performed in cells,optical diffraction limit prevents many mRNAs to be resolvedsimultaneously due to the limited space in a cell.

As shown in FIG. 4, there are a total of 5 barcoding runs and eachbarcoding run has four serial hybridizations. Three differentfluorophores (e.g., red, green and blue) are used in each serialhybridization experiment. For simplicity, each serial hybridizationexperiment in FIG. 4b only shows one molecular target for eachfluorophore. However, in practice, each fluorophore can represent tens,hundreds or even more molecular targets. In addition, unlike theillustration in FIG. 4b which shows that the same number of targets aretagged in each serial hybridization, different numbers of moleculartargets can be tagged in different serial hybridizations. For example,for a 1,500 mRNA experiment of 4 barcoding runs, serial hybridizationNo. 1 of barcoding round 1 can tag 400 mRNA transcripts with threedifferent fluorophores (e.g., red, green and blue). Serial hybridizationNo. 2 of the same barcoding round can tag 280 mRNA transcripts with thesame three fluorophores. 320 mRNA transcripts can be tagged the samethree fluorophores in serial hybridization No. 3. And 500 mRNAtranscripts can be tagged the same three fluorophores in serialhybridization No. 4. As the numbers suggest, within each serialhybridization, multiple mRNA transcripts can be tagged with the samefluorophore. In some embodiments, a fluorophore is attached to a probevia multiple readout sequences, which in turn are connected to a primarybinding sequence targeting a specific site in a molecular target. Insome embodiments, the readout sequences can be connected directly to theprimary sequence. In some embodiments, the readout sequences can beconnected indirectly to the primary sequence, for example, via one ormore intermediate molecules. In some embodiments, the one or moreintermediate molecules comprise an RNA bridge probe, a DNA bridge probe,a protein bridge probe, a probe for hybridization chain reaction (HCR),a hairpin nucleic acid probe, an HCR initiator, an HCR polymer, orcombinations thereof.

In addition, there can be different number of serial hybridizations intwo different barcoding rounds. For example, in the example presentedabove, 4 serial hybridizations are included in barcoding round 1.Barcoding round 2 can have 4 or 6 or 3 serial hybridizations, so long asthe results from the two sets of serial hybridizations can reveal thesame 1,500 mRNA transcripts. For example, images from all serialhybridizations in the same barcoding round can be compiled into acomposite image for the barcoding round. In this barcode scheme, 12pseudo-colors are created, which can be assigned numbers as 1 through12. In some embodiments, 12 different colors are assigned as symbols ofthe 12 pseudo-colors. After all 5 barcoding rounds are completed, the1,500 mRNA transcripts will be fully coded.

In some embodiments, the total number of targets revealed in twodifferent barcoding rounds can differ. For example, there are someproblems and only 1,400 mRNA transcripts are tagged in in barcodinground 1 while 1,350 mRNA are tagged in barcoding run 2. Subsequent imageand data analysis can reveal that only 1,200 of the mRNA transcriptsoverlap between the two barcoding rounds. And ultimately only 1,200 mRNAtranscripts are fully coded in all 5 barcoding rounds.

The pseudo-color scheme can overcome this density problem in both insitu and in vitro applications. In the in vitro cases, implemented withSPOTS, capturing transcripts onto an oligonucleotide dT surface andadjusting the dilution factors can easily remove the optical crowdingproblems and allow the transcriptome to be imaged by seqFISH.

In addition, pseudo-color is more efficient in terms of imaging timethan all existing imaging methods including expansion microscopy. Forexample, with 3 fluorophores, it takes 3⁽¹⁰⁻¹⁾=19,683 to code for thetranscriptome with one round of error correction. Thus, a total of 30frames of imaging is required. In a pseudo-color scheme, 20pseudo-colors can be used for 4 rounds of hybridization to code for20⁽⁴⁻¹⁾=8000 genes in each of the three fluorophore channels for a totalof 24,000 genes. This requires 3×20×4=240 frames to image, a 8 foldincrease in the imaging time. However, because the density of mRNA iseffectively diluted into 3×20=60 pseudo-channels instead of 3fluorophore without pseudo-color coding, the density problem isalleviated with a factor of 20. The target spots can be localized tonanometer precision by Gaussian fitting to decrease the density in eachpseudo-color channel before barcode alignment. Thus, the benefit ofpseudo-color coding is to decrease the density of the target spots inthe cell, while saving imaging time compared to expansion microscopy,where expanding the sample by 20 folds requires an additional 20 foldincrease in imaging time. Since imaging time is rate limiting in anysequential imaging method, pseudo-color solves a major problem inimplementing transcriptome profiling in situ.

In some embodiments, for in situ experiments based on a pseudo-colorscheme, fewer number of different detectable signals can be used so thatthey can be resolved from each other more clearly due to the limitedamount of real estate within a cell. In some embodiments, with superresolution microscopy, even densely populated detectable signals can beresolved from each other.

As an example, three different color signals are used in an in situexperiment. In some embodiments, for each color, hybridizationexperiments can be repeated 20 times to generate a 20 pseudo-colorimage. Here, each 20 pseudo-color image is compiled from only 1 realcolor channel although multiple colors can be used. The single-colorapproach offers several advantages during imaging analysis.Significantly, there are chromatic aberrations in a microscope, andaligning images across different actual color channel can be difficultand lead to errors or inaccuracies. In the current embodiments, all 20pseudo-color can be from the same color channel (e.g., the Cy5 channel)then all images are by definition aberration free.

Using an mRNA hybridization experiment as an example, mRNA transcriptsbound to activated color probes are visualized as dots in the image. Insome embodiments, Gaussian fitting is applied to localize the RNA dotsin cells and generate a higher resolution image or less dense image ofthe cell. In some embodiments, Gaussian localization can take place atone or more time points; for example, for each image corresponding to aserial hybridization experiment, for the images from all serialhybridization experiments in a barcoding round; for each composite imagecorresponding to a barcoding round. In some embodiments, Gaussianlocalization can be repeated multiple times.

In some embodiments, color beads can be used to align the pseudo-colorimages to create composite images.

When multiple color probes are used (e.g., 4 or 5 colors), the resultingimage can be too dense and it can be difficult or impossible to applythe Gaussian fitting process, which requires discrete dots. By virtuallydiluting the samples to 60 pseudo-color, then the density becomes muchless of an issue. After the first barcoding round, DNAse can be used tostrip the probes, which carries the RNA targeting sequence and one roundof readout sequence. This ensures that when barcoding is done in cells(in situ), there is no nonspecific binding. A new set of probes with the2nd barcode sequence is then hybridized. Another 20 rounds of serialhybridization with 3 colors is done. In some embodiments, images fromthe hybridization experiments are subject to Gaussian fit, and becompiled into 3 sets of 20 color pseudo-color images.

This process can be repeated 4 times to generate a 20×20×20×20 barcodein each color. So 8000 error corrected barcodes are available in eachcolor, to encode a total of 24000 barcodes.

In this example, the barcode scheme can be actually 3 separate pieces of8000 barcodes in each of the colors. There can be 60 readout sequencestotal. Because of the DNAse stripping step, the readout sequences usedin the 2nd barcoding hybridization can be the same as 1st barcodinghybridization.

For in vitro analysis, all the readout sequences can be added onto theprimary binding probes. In those cases, the readout sequences for thedifferent barcoding hybridizations have to be different. The approach ispossible in vitro because there is very little nonspecific binding,since the biological sample has often been extracted out of the cells.For in situ, more stringent conditions are needed to avoid nonspecificbinding in cells.

An alternative approach is to generate 100 readout sequences all in onecolor and just code for 10,000 targets with 3 rounds of hybridization(100×100×100) with one round of error correction. As disclosed herein,any manageable number of readout sequences can be used; for example, 50or fewer, 60 or fewer, 80 or fewer, 100 or fewer, 150 or fewer, 200 orfewer, 250 or fewer, 300 or fewer, 400 or fewer, or 500 or fewer. Insome embodiments, more than 500 readout sequences can be used.

As disclosed herein, the pseudo-color scheme is better than expansionmicroscopy in solving the density issue. Expansion microscopy is a newtechnology for imaging biological samples with fine detail by physicallymaking the samples bigger through a chemical process that preservesnanoscale isotropy. Expansion microscopy enables super-resolutionimaging on a conventional light microscope. However, expansion is acomplicated procedure and the sample can contract and expand during therehybridization process, which makes aligning the barcodes hard. Also,if the tissue sample is expanded 10 times, the tissue will be imaged 10times longer. So even though pseudo-color appears to require morehybridizations, the total imaging time is comparable or even morefavorable compared to if one had to expand the sample. For example, the60 pseudo-color scheme is basically 20× expansion compared to just a 3color scheme.

In a seqFISH experiment, 10 rounds of hybridization are needed to encode20,000 genes with error correction with 3 colors (3⁽¹⁰⁻¹⁾=19,683). Asdisclosed herein, the pseudo-color scheme use 20×4=80 rounds ofhybridizations for each channel. Thus, the number of hybridizations goneup by 8 fold, but the density is diluted down by a factor of 20. Thusthe current approach is more efficient than expansion microscopy. Forexample, one can also use 1 channel and 100 pseudo-color and threerounds of hybridization, which correspond to 33.3 fold (100pseudo-colors/3 colors) improvement in dilution and only 10 foldincrease in imaging time. As disclosed herein, 3 color signals and 20pseudo-color in each channel have produced excellent results. Inpractice, one can balance the “dilution” factor with the number oftargets being analyze to determine the number of the actual colorsignals and the number of pseudo-colors.

In some embodiments, the current approach can also be combined withcorrelation FISH (corrFISH, Coskun and Cai, Nature Methods 2016) toquantify the abundances of highly expressed mRNAs. corrFISH can beapplied to decoding samples where the number of target dots are so highthat they are individually resolvable. This approach can decode theabundance of the genes, but trade off in spatial resolution. As anexample of implementing the corrFISH scheme in addition to the existingpseudocolor scheme, a 4th channel can be dedicated to barcode highlyabundant transcripts by corrFISH in addition to the 3 colors channelsused for generating the pseudo-colors. The 4th channels can generate 20pseudo-color through 20 rounds of serial hybridization. Then corrFISHcan be used to code for another 8000 genes, but highly expressed.

In some implementations, primary probes can be stripped and signalsextinguished after the pseudocolor schemes and probes targeting anotherset of targets can be hybridized. The utility of this can be thatmultiple types of targets can be probed sequentially in the same cells.For example, 20,000 intronic RNAs can be first targeted through thepseudo-color scheme. Then 20,000 mRNAs can follow, with 20,000 proteinsafter that. Or alternatively, probes can be used to target specifichighly expressed genes to measure their localizations in cells,especially cells with polarization, such as processes in neurons. Orprobes can be used to target specific isoforms or combination ofisoforms of specific targets.

As disclosed herein, the approached disclosed herein can also work withamplification methods, such as HCR, to implement this work robustly intissues.

It will be noted that although the approach above is described inconnection with an in situ experiment, one of more aspects of it can beapplied to in vitro experiments as well.

In one aspect, disclosed herein are readout probes with cleavablelinkers. FIG. 25 depicts exemplary chemical reactions for synthesizing areadout probe with a disulfide linker.

In one aspect, sequential barcoding FISH (seqFISH) is performed by usingnucleic acid readout probes that are conjugated with a signal moiety viaa cleavable linker. Any suitable cleavable linkers can be used,including but not limited to an enzyme cleavable linker, anucleophile/base sensitive linker, reduction sensitive linker, aphoto-cleavable linker, an electrophile/acid sensitive linker, ametal-assisted cleavable linker, or an oxidation sensitive linker.Exemplary linkers can be found in Leriche et al., 2012, “Cleavablelinkers in chemical biology,” Bioorganic & Medicinal Chemistry20:571-582, which is hereby incorporated herein in its entirety.

In some embodiments, the cleavable linker is a disulfide linkage. Insome embodiments, the cleavable linker is a nucleic acid restrictionsite. In some embodiments, the cleavable linker is a protease cleavagesite.

An exemplary system utilizing nucleic acid readout probes is shown inFIG. 26. As depicted, a gene specific primary probe binds to a targetsite; e.g., in an mRNA molecule under an in situ or in vitro setting.Besides a binding sequence, the primary probe further includes anoverhang sequence at one end of the binding sequence. In someembodiments, a second overhang sequence is included at the other end ofthe binding sequence.

In some embodiments, an overhang sequence includes one or more targetsequences to which one or more nucleic acid readout probes bind. In someembodiments, each target sequence uniquely interacts with a set ofreadout probes with specific readout binding sequences. As disclosedherein, an overhang sequence may include two target sequences, threetarget sequences, five or fewer target sequences, seven or fewer targetsequences, or ten or fewer target sequences. In some embodiments, anoverhang sequence may include ten or more target sequences. Similararrangements can be implemented where there are two overhang sequences.

In some embodiments, an overhang sequence binds to a bridge probe thatprovides target sequences for one or more readout probes to bind, asdepicted in FIG. 26. A bridge probe can be interchangeably called anintermediate bridge probe or a secondary bridge probe. A bridge probeincludes a binding sequence that binds to all or a portion of anoverhang sequence in a primary probe. In some embodiments, a bridgeprobe further includes one or more readout binding targets that areconnection in series and linked to the binding sequence.

In some embodiments, as depicted in FIG. 26, two bridge probes can bindto the same primary probe via two overhang sequences.

As disclosed herein, a bridge probe may include two readout bindingtargets, three readout binding targets, five or fewer readout bindingtargets, seven or fewer readout binding targets, or ten or fewer readoutbinding targets. In some embodiments, an overhang sequence may includeten or more readout binding targets. Similar arrangements can beimplemented where there are two bridge probes bound to overhangsequences.

Exemplary rehybridization schemes utilizing the readout probes areillustrated in FIG. 26. For example, the first round of rehybridization(hyb1) begins with the hybridization of gene specific primary probes tothe target mRNA. Each gene specific primary probes contains one or more“overhang” sequences that allows the secondary bridge probes tohybridize against. The secondary bridges contain two or more tertiaryreadouts binding sites which is the key to efficient and quickrehybridization. In the first hybridization, unique tertiary readoutprobes conjugated with blue dye are hybridized to their unique bindingsites on the secondary bridge probe. Once imaged, the sample is treatedwith reducing agent such as TCEP or DTT to cleave off thedisulfide-linked dyes. Then, the sample is washed with wash buffers.During the second round of hybridization, a second set of uniquetertiary readout probes with red dye is hybridized to its unique bindingsite on the secondary bridge. After two rounds of hybridizations, aparticular mRNA is then barcoded with a color barcode of red and blue.Additional rounds of hybridization can be applied to create moresophisticated barcoding sequences. Technically, the scaling factor ofseqFISH with this rehybridization method depends on the number ofavailable secondary bridges with its number of unique tertiary probesbinding sites. For example, by incorporating 2 secondary bridges withtotal 8 unique tertiary readout binding sites (N=8), and with 4fluorophores (F=4), one can generate up over 64,000 unique barcodes(F^(N)=4⁸=65,536). Moreover, in embodiments where bridge probes areused, it is possible to strip off the secondary bridges with highconcentration of formamide, and flow in another unique set of secondarybridges to continue the scaling process, which further increases theupper limit of the scaling factor.

In one aspect, disclosed herein are methods and systems for amplifyingvisual signals during each round of hybridization during sequentialhybridization reactions, based on hybridization chain reaction (HCR). Anexemplary embodiment of HCR is illustrated in FIG. 27A. Duringhybridization round 1, probes with overhang initiator sequences areadded to a nucleic acid target molecule such as an mRNA or a DNA. Alsoadded are hairpin nucleic acid probes bearing sequences complementary tothose of the initiator sequences. The presence of initiator sequencescause unfolding of the hairpin nucleic acid probes and result in chainreactions that lead to self-assembled extended HCR polymers. Becauseeach hairpin nucleic acid probe bears a signal, self-assembled extendedHCR polymers result in amplification of signals and better detection oftarget sites.

FIG. 27B illustrates an exemplary readout probe embedded with acleavable linker. Here, the cleavable linker is a disulfide bond. At oneend of the cleavable linker, a readout probe as disclosed hereinincludes a binding sequence that allows it to bind to a specific nucleicacid target. In some embodiments, the nucleic acid target is an mRNA ora DNA. In some embodiments, the nucleic acid target is within an intactcell or as part of cell extract. In some embodiments, the nucleic acidtarget is within a primary binding probe that directly binds to a targetsite in an mRNA. In some embodiments, the nucleic acid target is withina secondary binding probe that binds to a primary binding probe thatdirectly binds to a target site in an mRNA. In some embodiments, thenucleic acid target is within a tertiary or quaternary binding probe.One of skill in the art can apply the principle to any level of bindingand interaction.

At the other end of the cleavable linker, a readout probe as disclosedherein further includes an HCR initiator sequence. When exposed tohairpin nucleic acids bearing partial or complete complementarysequences, the initiator sequence can trigger a chair reaction thatallows a signal motif formed by multiple extender probes. Each extenderprobe includes a signal moiety. Aggregation of multiple extender probesenhances signal detection.

An exemplary scheme for forming a signal motif with multiple extenderprobes during a sequential hybridization process is illustrated in FIG.27C. During the first round of hybridization, nucleic acid detectionprobes with embedded cleavable linkers binds to a first target sitewithin a nucleic acid target sequence. In some embodiments, extenderprobes are added after the initial binding of nucleic acid detectionprobes to the first target sequences. In some embodiments, extenderprobes form an aggregate before the aggregated polymer is added to thereaction mix and binds to the imitator sequence in the nucleic aciddetection probes.

In some embodiments, extender probes are standard hairpin probes eachincluding a sequence that is partly or completely complementary to theinitiator sequence in the readout probes. In these embodiments, extenderprobes are very similar or identical to each other. The size of theresulting extendible signal motif may be controlled by the concentrationor absolute quantity of the extender probes added.

In some embodiments, extender probes including different types ofnucleic acid sequences can be used to achieve controlled signalamplification. For example, the signal can be amplified five times iffive populations of extender probes are used: {EP₁, EP₂, EP₃, EP₄, andEP₅}. The first population of extender probes includes a bindingsequence that binds to all or a part of the initiator sequence. Thesecond population of extender probes includes a binding sequence thatbinds to a region in the first population of extender sequence. Thethird population of extender probes includes a binding sequence thatbinds to a region in the second population of extender sequence. Thefourth population of extender probes includes a binding sequence thatbinds to a region in the third population of extender sequence. Thefifth population of extender probes includes a binding sequence thatbinds to a region in the fourth population of extender sequence. In suchembodiments of linear amplification, the size of the resultingextendible signal motif can be controlled by the number of populationsof extender probes that are provided.

In some embodiments, an extender probe may include multiple bindingsites for binding subsequent extender probes. For example, besidesbinding to the initiator sequence, EP₁ may include two or more bindingsites for EP₂, thus allowing further amplification of the signal. Thisform of amplification may occur at any level. For example, in theexample above, multiple binding sites for subsequent or downstreamextender probes can be implemented in any one or combinations of EP₁,EP₂, EP₃, or EP₄. For example, extender probes from EP₂, EP₃, or EP₄ canall bind to target sites in EP₁, which in turn binds to the initiatorsequence.

In some embodiments, the amplification occurs at multiple levels.Generally, when m populations of extender probes are present, multiplebinding sites for subsequent or downstream extender probes cam beimplemented in any one or combinations of EP₁, EP₂, . . . , or EP_(m-1).Additionally, when multiple binding sites are present, they can beconnected in series or arranged in a non-linear fashion (e.g., in abranched or circular arrangement). Depending on the number andconfiguration of the binding sites, the resulting extendible signalmotif can be a stick, a ball, a net or in any other applicable form.

One of skill in the art would understand that any suitable number ofpopulations of extender probes can be added to achieve an optimal signalto noise ratio for the best imaging effects. For example, the extenderprobes can include five or fewer, seven or few, 10 or fewer, 15 orfewer, 20 or fewer, 25 or fewer, 30 or fewer, 40 or fewer, 50 or fewerpopulations.

In some embodiments, the extender probes are mixed together prior tobeing mixed with the readout probes having the initiator sequence. Insome embodiments, the extender probes are sequentially added to thereadout probes having the initiator sequence where the readout probesare already bound to its nucleic acid targets.

As shown in FIG. 27C, after imaging analysis, a cleaving agent can beapplied to sever the linker between the binding sequence and theimitator sequence in a readout probe. The amplified polymers can then becleaved off and washed away.

During a second round of rehybridization, new nucleic acid detectionprobes are applied. The new nucleic acid detection probes include adifferent binding sequence that binds to a second and different targetsite in the nucleic acid target sequence. The new nucleic acid detectionprobes also include a cleavable linker and an initiator sequence. Theinitiator sequence can be the same as or different from the initiatorsequence from the previous set of nucleic acid detection probes.

The new extender probes are used, as described hereinabove, to formamplified polymers to enhance signal detection. After imaging analysis,the new set of amplified polymers can be cleaved off and washed away. Byusing extender probes bearing a different type of visual signals,barcodes can be established for nucleic acid targets. Depending on theavailability of target sites within a nucleic acid target, multiplerounds of hybridizations can be performed to create more complexbarcodes. For example, there can be three rounds of hybridizations, fourrounds of hybridizations, five rounds of hybridizations, seven or fewerrounds of hybridizations, 10 or fewer rounds of hybridizations, 12 orfewer rounds of hybridizations, 15 or fewer rounds of hybridizations, 20or fewer rounds of hybridizations, 30 or fewer rounds of hybridizations,40 or fewer rounds of hybridizations, or 50 or fewer rounds ofhybridizations.

The compositions and methods disclosed herein can be used in sequentialhybridizations to identify any suitable cellular targets within anintact cell or in an in vitro setting. In some embodiments, the cellulartargets can be mRNAs or DNAs. In some embodiments, the cellular targetscan be proteins. For example, the initial target-binding primary probecan be an antibody conjugated with nucleic acid sequence for subsequentbindings.

As exemplified herein, provided technologies work for a wide variety ofsamples. For example, HCR-seqFISH worked in brain slices and that SPIMscan robustly detect single mRNAs in CLARITY brain slices. In someembodiments, provided technologies are useful for profiling targets inmouse models of neurodegenerative diseases, or human brains. No othertechnology prior to the present invention can deliver the same qualityand quantity of data.

Error Correction Mechanism

FIG. 3A illustrates general aspects of a sequential hybridizationanalysis that may contribute to quality of the analysis. Sequentialhybridization includes multiple rounds of hybridization, where eachround of hybridization is a multiple step process. Errors can beintroduced at any step during any round of hybridization. Such errorscan lead to misidentification of target genes in a sample.

Barcodes and Error Correction

In one aspect, disclosed herein are methods for designing barcodes withbuilt-in error correction mechanisms such that the multi-componentbarcodes can withstand the loss of the data from one or more rounds ofhybridization (i.e., drop-safe). As disclosed herein the terms “barcode”and “code” are used interchangeably.

As disclosed herein, by using probes that are associated with Fdetectable visual signals (F≥2), a sequential hybridization of N rounds(N≥2) can generate a total of F^(N) combinations of visual signals. Insome embodiments, these combinations of visual signals can be used asbarcodes to uniquely identify cellular targets such as mRNA, DNA, oreven protein.

FIG. 30 illustrates an exemplary process 3000 for generating drop safebarcodes.

At step 3010, the total number of genes that will be analyzed during thehybridization experiments is determined. This number sets the thresholdvalues for the number of detectable visual signals (F) and the totalnumber of rounds in the sequential hybridization (N).

Once the total number of genes is determined, steps 3020 and 3030 areperformed simultaneously. The number of genes being analyzed must besmaller than the total number of possible combinations of visual signals(F^(N)). Practical aspects of the hybridization analysis need to beconsidered when selecting values for F and N. One would tend to reducethe number of rounds of hybridization to as few as possible.Theoretically, this can be achieved by using a high number of detectablevisual signals (F). In practice, however, too many different types ofvisual signals may interfere with each other. For example, overlappingof visual signals can lead to barcode misidentification.

At step 3040, a library of drop-safe unique barcodes are generated byimplementing one or more error correction mechanisms.

In some embodiments, a repeat round can be performed for any roundduring a sequential hybridization of N rounds, rendering a newsequential hybridization of (N+1) rounds. The extra repeat round can bean error correction round. The repeat round can be a duplicate of anyround of the n rounds sequential hybridization. The repeat round cantake place as any round during the sequential hybridization (N+1)rounds.

After the repeat, there are two rounds of hybridization that should beidentical to each other. Consequently, the complete loss of one of therepeat rounds does not affect the outcome of the sequentialhybridization. As such, either of the repeat rounds is a drop-saferound.

FIG. 2 illustrates an experiment where 3 rounds of hybridization usingprobes with 4 types of detectable visual signals (red: R, yellow: Y,green: G, and cyan: C) are used to create barcodes for 4 different mRNAmolecules. Hybridization round 3 is a repeat of hybridization round 1,as summarized in Table 1 below.

TABLE 1 Illustration of the effect of repeat hybridization rounds. mRNAColor barcodes Color barcodes Color barcodes molecules (3 rounds)(dropping round 1) (dropping round 3) mRNA1 Y-C-Y C-Y Y-C mRNA2 G-R-GR-G G-R mRNA3 R-C-R C-R R-C mRNA4 C-R-C R-C C-R

As shown in the table above, data from one of the repeat rounds can bedropped completely in case of major experiment error, barcodes derivedfrom the remaining rounds of hybridization still uniquely represent themRNA molecules.

In some embodiments, even in a questionable hybridization round, most ofthe information is still reliable. Only some of the bindings betweenprobes and target sequences include inaccurate information. In someembodiments, partial data from a questionable round of hybridization areused. For example, in the illustration above, binding signals can bemissing or ambiguous for a particular location during hybridizationround 1, which can produce an incomplete three letter barcode *—C—Y forthe particular location, where * remains undetermined. In the schemeillustrated, the identity of * is not needed to decipher that the codeis for mRNA1. Similarly, binding signals can be missing or ambiguous fora particular location during hybridization round 2, which can produce anincomplete three letter barcode R—*—R for the particular location,where * remains undetermined. Once again, the identity of * is notneeded to decipher that the code is for mRNA3.

Additionally, data from repeat rounds can validate each other. Forexample, in FIG. 2C, a circle highlights a cyan data point in the imagecorresponding to hybridization round 2. In the same location, the imagecorresponding to hybridization round 3 reveals a yellow data point.Based on only information from hybridization rounds 2 and 3, thislocation would be identified as part of mRNA1. However, no signals areidentified at the location during hybridization round 1, which suggeststhat the highlighted data points may be due to non-specific binding.

In some embodiments, a sophisticated barcode generating algorithm isused such that the resulting barcodes can withstand the loss of anyround or even multiple rounds of hybridization data. In someembodiments, a barcode generator is used to generate the drop-safebarcodes. For example, FIG. 32 illustrates an example, where probes with5 different visual signals (blue, green, red, purple and yellow) areused in 4 rounds of hybridization. One of the hybridization round is anerror correction round where barcodes are generated based on barcodesfrom the previous 3 rounds. The following is an example that illustrateshow barcodes are generated.

Designing an error correction code to correct for m number of errors ina message of n length is analogous to packing as many spheres of radiusm in a n dimensional cube. There are examples of “perfect codes” such asGolay and Hamming codes that can be as efficient as possible in thispacking design. These perfect codes are important in digitalcommunication because the word lengths are long, up to billions ofletters for gigabytes of data, and many forms of errors can occur,including deletion and insertions. However, in the seqFISH experiments,as the code lengths are short, a perfect code correction system is notnecessary, especially as the “correct” codes are already defined. One ofthe major source of error is deletions due to loss of a hybridization.Thus, it is possible to design simple correction schemes that are notcompletely efficient (i.e. obtain the tightest packing density for then-spheres) but can achieve good error correction with just a few extrarounds of hybridization.

To design a barcode scheme that can tolerate loss of a single round ofhybridization is akin to a problem where any n-dimensional hypercube iscollapsed by 1 dimension to a n−1 dimensional hypercube without havingany two points on the n-dimensional hypercube mapping to the same point.In order for this to be true, no two barcodes can be connected by a 1Dline running parallel to any of the axes. There are many solutions togenerate this 1 round loss tolerant code.

In this example, 4 rounds of hybridization is used. Here, 5 differentvisual signals (blue, green, red, purple and yellow) are assignednumerical values. In some embodiments, the numerical values areintegers. For example, blue=1; green=2; red=3; purple=4; and yellow=5.It would be understood that these are mere sample values. Anynon-redundant numerical values can be assigned to represent thedifferent types of visual signals. In some embodiments, a barcodegenerator is used to generate the barcodes used in the experiment. Inthe exemplary embodiment, a drop-safe barcode for a particular targetgene can be defined as a four-component linear array: {i, (i+j+k)mod 5,j, k}. Here, mod (modulo operation or modulus) finds the remainder afterdivision of one number by another. For example, 8 mod 5 is 3. 5 mod 5 is0, which is equivalent to 5.

In this example, i represents the numerical values corresponding to thevisual signals observed for the particular target gene during the firstround of hybridization. The scheme (i+j+k)mod 5 represents the numericalvalues corresponding to the visual signals observed for the particulartarget gene during the second round of hybridization. j represents thenumerical values corresponding to the visual signals observed for theparticular target gene during the third round of hybridization. krepresents the numerical values corresponding to the visual signalsobserved for the particular target gene during the found round ofhybridization. In this example, i, j, and k each can be 1, 2, 3, 4 or 5,or any one of the numerical values that have been assigned to the fivetypes of visual signals used in the experiment.

In this example, (i+j+k)mod 5 is determined as the error correctionround. However, once complete barcodes are generated, any of round 1through round 4 can be dropped to yield unique 3-component barcodes. Assuch, the barcodes determined by this method can be used to correcterrors in any round.

The following table illustrates how the 1 drop tolerant barcodes can begenerated using the equation (i+j+k)mod 5.

TABLE 2 Illustration of the effect of repeat hybridization rounds.1^(st) round 2^(nd) round 3^(rd) round 4^(th) round of hyb Genes of hyb*of hyb of hyb (i + j + k)mod 5 mRNA1 1 2 4 2 mRNA2 3 3 1 2 mRNA 3 5 1 23 mRNA 4 2 3 5 5 . . . . . . . . . . . . . . . mRNA125 5 2 1 3 *The term“hyb.” stands for hybridization. Numerical values are assigned to colorsignals as follows: blue = 1; green = 2; red = 3; purple = 4; and yellow= 5.

As illustrated above, although the 4^(th) round of hybridization isgenerated using an error correction algorithm, any one round of fourrounds of hybridization in Table 2 can be dropped and still yield aunique set of barcodes for 125 genes.

More generally, a barcode that can resist the elimination of one roundof hybridization can be defined as:

{j ₁ ,j ₂, . . . (a ₁ *+a ₂ *j ₂ + . . . +a _(n) *j _(n) +C)mod F, . . .,j _(n)}  (1)

where j₁ is a numerical value that corresponds the detectable visualsignals used in the first round of hybridization, j₂ is a numericalvalue that corresponds the detectable visual signals used in the secondround of hybridization, and j_(n) is a numerical value that correspondsthe detectable visual signals used in the nth round of hybridization. Insome embodiments, j₁, j₂, . . . j_(n) are non-redundant integers. Insome embodiments, a₁, a₂, . . . a_(n) can be any integers that are notnone zero. In some embodiments, C is a constant integer. In someembodiments, C is zero. The remainder of F divided by F is 0 (F modF=0), so F and 0 are equivalent. There is no limitation on the number ofhybridization. One of such examples is shown in FIG. 37.

Array (1) is a general representation of a barcode that is safe againstthe drop or loss of one round of hybridization. Although (a₁*j₁+a₂*j₂+ .. . +a_(n)*j_(n)+C)mod F is the designated error correction round, insome embodiments, the barcode is safe against the loss or drop of anyround of hybridization.

As disclosed herein, array (1) consists of n-component, eachcorresponding to the visual signals from a particular round ofhybridization. In some embodiments, probes binding to a particular geneare all associated with the same detectable visual signal, for example,red, green or blue. In some embodiments, probes binding to a particulargene are all associated with multiple types of detectable visual signal,for example, green+yellow or blue+red. Through combinations of visualsignals, the total number of different types of detectable visualsignals can be further expanded.

In some embodiments, barcodes can be designed such that drop or loss ofdata from two rounds of hybridization can be tolerated. Using 2additional rounds of hybridization does not correct for all possible 2drops, but it does correct for a large fraction of the 2 drops. Forexample, for detecting 100 genes with F=5 dyes, 3 rounds ofhybridization are needed for basic barcoding of these genes. When addingtwo rounds of hybridization, the error correction code:

{i,j,k,(i+j+k)mod F,(i−j)mod F}  (2)

Such codes can correct for 2 drops all except dropping hybridizationround 3 and round 4 together. Here, each component in the 5-member arrayrepresents one round of hybridization.

Similarly, an error correction code such as

{i,j,k,(i+j+k)mod F,(i−k)mod F}  (3)

can correct for dropping hybridization round 2 and hybridization round 4together. Again, each component in the 5-member array represents oneround of hybridization.

For example, to code for most of the transcriptome, only 6 rounds ofhybridization are needed when F=5 (6⁵=15,625). When adding two rounds ofhybridization, the following error correction code is generated:

{i,j,k,l,m,n,(i+k+l+m+n)mod F,(i−j−k−l+n)mod F}  (4)

There are a total of 28 combinations of how 2 rounds of hybridizationcan be lost or dropped. This type of code can correct for 24 out of thetotal 28 combinations. Here, each component in the 8-member arrayrepresents one round of hybridization. Similarly, the 1^(st) errorcorrection round can be any liner combination of 5 out of 6 rounds ofhybridization (e.g., without j) and 2^(nd) error correction can be asubset of the linear combination of 5 out of 6 rounds of hybridization(e.g., without m). In these embodiments, in the 2^(nd) error correctionround, indices include different coefficients as long as the it is notexactly the same 5 indices used in the 1^(st) error correction round.

To correct for all combinations of drop or loss of 2 rounds ofhybridization (2 drops) fully, 3 additional hybridizations are needed.Again for 6 rounds of hybridization with 5 types of detectable signals(F=5), three extra rounds of hybridizations are added to create the full9-member error correction code:

{i,j,k,l,m,n,(i++k+1+m+n)mod F,(i−j−k−l)mod F,(m−n−j+k)mod F}  (5)

In some embodiments, there are many equivalent codes that can correctfor 2 drops with 3 additional rounds of hybridization. They can be allempirically determined. The number of hybridization for any reasonablenumber can be simulated to determine the complete correcting barcode.

In some embodiments, three additional hybridization can correct formajority of the errors due to drop or loss of three rounds ofhybridization. For example, for 6 rounds of hybridization with 5 typesof detectable signals (F=5), three extra rounds of hybridizations areadded to create the full 9-member error correction code:

{i,j,k,l,m,n,(k+i−l+m−n)mod F,(i−l+j−k+m)modF,(l−n−j−k+i)mod F}  (6)

Similar to the previous example, 3 additional rounds of hybridizationscan correct for a majority of the loss or drop of 3 rounds ofhybridization. There are a total of 84 combinations how 3 rounds ofhybridization can be lost or dropped. A 9-component code as illustratedin (6) can correct for 72 out of the 84 combinations.

In some embodiments, 4 additional rounds of hybridizations can correctfor the drop or loss of all and any three rounds of hybridization. Anexample 10-component code is as follows:

{i,j,k,l,m,n,(k+1+1+m+n)mod F,(i−l+j−k+m)mod F,(l−j−k+i)modF,(n−k−i−j+m)mod F}  (7)

It will be understood that there are many other solutions that can bedetermined empirically. For higher number of drops, similar correctionschemes can be determined empirically.

For 16,000 species, this scheme allows 10 hybs with the ability tocorrect 3 drops. In comparison, in MERFISH, 16 hybs are needed to target140 species, with only 2 round correction ability. Because the moreround of hybridization one implements, the more mistakes can be made,keeping the number of hybs low is crucial. Thus, this error correctionscheme is very powerful compared to the Hamming Distance scheme used inMERFISH. This is because hamming distance correction is used intelecommunications with binary numbers, which uses much longer stringsof 0,1.

As described above, the design disclosed above can correct for loss of 1hybridization for an arbitrarily long barcode sequence with minimalextra effort. In this example, only one round of error correction isneeded in a total of 4 rounds of hybridization that analyzes 100 genes,which below the capacity of 54 (625).

For example, 7 rounds of hybridization with 5 colors can cover 5⁷=78,125transcripts, more than the transcriptome, with 8 hybridizations theentire transcriptome can be coded with error correction using thebarcoding system disclosed herein.

Another consideration in designing error-tolerant barcodes is that themechanism of re-hybridization should guide the robustness of errorcorrection. In the merFISH implementation of seqFISH (Chen 2015), nullsignal, or “0”, along with “1” which is cy5 fluorescence, is used toform a binary barcode. However, it is difficult to determine whether nosignal is due to mis-hybridization or actual null signal. In the seqFISHimplementation using positive signals as readouts during each round ofhybridization reduces the need for error correction because falsepositive signal is unlikely to re-occur in the same position duringanother hybridization due to DNAse stripping between hybridizations.Thus, implementation of seqFISH with 5 colors and 1 extra round ofhybridization to error correct is both efficient and accurate, andallows imaging of a large tissue sections since imaging time isultimately limiting in multiplexing experiments.

At step 3050, sequential hybridization is carried out to associate orassign barcodes from step 3040 to target genes in a sample. As disclosedherein, the sample can be immobilized mRNAs, DNAs, chromosomal DNAs, andcombinations thereof. For example, in the 100-gene sequentialhybridization example (see FIG. 32 and FIG. 40), 4 rounds ofhybridization are carried out using probes associated with 5 differenttypes of visual signals. Barcodes are assigned through selection ofprobes during the 4 rounds of hybridization experiment on immobilizednucleic acid samples.

At step 3060, after hybridization, visual signals are collected and usedin further analysis. For example, images are collected from differenthybridization are used to readout the barcodes for specific locations onthe immobilized nucleic acid samples. Such barcodes can then be used todecipher the identity of the nucleic acid targets (see, for example,FIGS. 2, 32, 33, 40 and 41).

In one aspect, sequential hybridization and serial hybridization arecombined for gene identification. In serial hybridization, only oneround of hybridization is used to identify target genes. The method isparticularly helpful when analyzing genes whose expression level is toohigh. In some embodiments, genes that are highly expressed, if includedin hybridization analysis with genes that are not so highly expressed,would overpower the signals for the genes that are not so highlyexpression. In some embodiment, the method can also applied to geneswhose expression level is too low.

In some embodiments, expression levels of genes are pre-determined. Forexample, gene expression levels (e.g., measured by mRNA transcriptionlevel) can be already available for certain species. It is possible toidentify highly expressed genes by mining publically available data,thus obviating the need to conduct additional experiments to measureexpression level.

In some embodiments, initial experiments are performed to determinerelative expression level of candidate genes. In some embodiments, genesare grouped according to their expression levels. For example, geneswith moderate or low expression levels can be grouped together andsubject to sequential hybridization analysis. Genes that are highlyexpressed can be subject to serial hybridization analysis. In someembodiments, expression levels of different genes are compared to thesame control gene to derive a relative expression level. For example,the expression level of actin can be used as a control. It will beunderstood that gene expression level may vary by organisms and canchange with respect to different internal and environmental controls. Insome embodiments, data from existing expression analysis can be used inidentifying highly expressed gene. In some embodiments, preliminaryexpression analysis is carried out before sequential and/or serialhybridization analysis.

In some embodiments, a threshold value is set for high expression. Anygenes having expression level above the threshold will be excluded fromsequential hybridization.

Depending on types of detectable visual signals that are available, aserial hybridization experiment can detect as many target genes as thenumber of types of detectable visual signals. For example, in theexperiment illustrated in FIGS. 32 and 40, 5 genes are analyzed at thesame time during one serial hybridization experiment.

In some embodiments, when multiple target genes are present in oneserial hybridization round, the number of probes that recognize eachtarget gene is selected such that overlapping of signals is minimize oravoided. In some embodiments, the concentration of probes are selectedto avoid or minimize overlapping of detectable signals.

Computer System

FIG. 31 depicts a diagram of an example system architecture forimplementing the features and processes of the method disclosed herein,in particular the barcode design functionalities with embedded errorcorrect mechanism.

In one aspect, some embodiments can employ a computer system (such asthe computer system 3100) to perform methods in accordance with variousembodiments of the invention. An exemplary embodiment of computer system3100, includes a bus 3102, one or more processors 3112, one or morestorage devices 3114, at least an input device 3116, at least an outputdevice 3118, a communication subsystem 3120, working memory 3130 whichincludes an operating system 3132, device drivers, executable libraries,and/or other code, such as one or more application(s) 3134 (one or morefor implementing the methods disclosed herein).

According to a set of embodiments, some or all of the procedures of suchmethods are performed by the computer system 3100 in response toprocessor 3112 executing one or more sequences of one or moreinstructions (which might be incorporated into operating system 3132and/or other code, such as an application program 3134) contained inworking memory 3130. Such instructions can be read into the workingmemory 3130 from another computer-readable medium, such as one or moreof storage device(s) 3114. Merely by way of example, execution of thesequences of instructions contained in working memory 3130 might causeprocessor(s) 3112 to perform one or more procedures of the methodsdescribed herein. Additionally or alternatively, portions of the methodsdescribed herein can be executed through specialized hardware. Merely byway of example, a portion of one or more procedures described withrespect to the method(s) discussed above, such as method 3000, andmethods illustrated in FIGS. 3B and 30, might be implemented byprocessor 3112.

In some embodiments, computer system 3100 can further include (and/or bein communication with) one or more non-transitory storage devices 3114,which can comprise, without limitation, local and/or network accessiblestorage, and/or can include, without limitation, a disk drive, a drivearray, an optical storage device, a solid-state storage device, such asa random access memory (“RAM”), and/or a read-only memory (“ROM”), whichcan be programmable, flash-updateable, and/or the like. Such storagedevices can be configured to implement any appropriate data stores,including without limitation, various file systems, database structures,and/or the like. In some embodiments, the storage device 3114 can beexample of local database of a user device, or the server database of aserver.

In some embodiments, computer system 3100 can further include one ormore input devices 3116, which can comprise, without limitation, anyinput device that allows a computer device to receive commands from auser such as a request for barcode design.

In some embodiments, computer system 3100 can further include one ormore input output devices 3118, which can comprise, without limitation,any output device that can receive information from a computer deviceand communicate such information to a user, to another computer device,to the environment of the computer device, or to a functional componentcommunicably connected with the computer device. Examples of inputdevices include but are not limited to a display, a keyboard, a mouseand etc. For example, the results of the barcoding analysis can bepresented on any one or more of the output devices, for example, as twodimensional heat map, cluster map, a list, a table, and etc.

It would be understood that any applicable input/output devices orcomponents, such as those disclosed in connection with one or more userdevice or server, can be applied to input device 3116 and output device3118.

In some embodiments, computer system 3100 might also include acommunications subsystem 3120, which can include without limitation amodem, a network card (wireless or wired), an infrared communicationdevice, a wireless communication device, and/or a chipset (such as aBluetooth™ device, an 802.11 device, a WiFi device, a WiMax device,cellular communication facilities, etc.), and/or the like.Communications subsystem 620 can include one or more input and/or outputcommunication interfaces to permit data to be exchanged with a network,other computer systems, and/or any other electrical devices/peripherals.In many embodiments, computer system 3100 will further comprise aworking memory 3130, which can include a RAM or ROM device, as describedabove.

In some embodiments, computer system 3100 also can comprise softwareelements, shown as being currently located within the working memory3130, including an operating system 3132, device drivers, executablelibraries, and/or other code, such as one or more application(s) 3134,which can comprise computer programs provided by various embodiments,and/or can be designed to implement methods, and/or configure systems,provided by other embodiments, as described herein. Merely by way ofexample, a portion of one or more procedures described with respect tothe method(s) discussed above, such as the methods described in relationto FIG. 30, can be implemented as code and/or instructions executable bya computer (and/or a processing unit within a computer); in an aspect,then, such code and/or instructions can be used to configure. and/oradapt a general purpose computer (or other device) to perform one ormore operations in accordance with the described methods.

A set of these instructions and/or code might be stored on anon-transitory computer-readable storage medium, such as storagedevice(s) 3114 described above. In some cases, the storage medium mightbe incorporated within a computer system, such as computer system 3100.In other embodiments, the storage medium might be separate from acomputer system (e.g., a removable medium, such as an optical disc),and/or provided in an installation package, such that the storage mediumcan be used to program, configure, and/or adapt a general purposecomputer with the instructions/code stored thereon. These instructionsmight take the form of executable code, which is executable by computersystem 3100 and/or might take the form of source and/or installablecode, which, upon compilation and/or installation on the computer system3100 (e.g., using any of a variety of generally available compilers,installation programs, compression/decompression utilities, etc.), thentakes the form of executable code.

It will be apparent to those skilled in the art that substantialvariations can be made in accordance with specific requirements. Forexample, customized hardware might also be used, and/or particularelements might be implemented in hardware, software (including portablesoftware, such as applets, etc.), or both. Further, connection to othercomputing devices such as network input/output devices can be employed.

The terms “machine-readable medium” and “computer-readable medium,” asused herein, refer to any medium that participates in providing datathat causes a machine to operate in a specific fashion. In an embodimentimplemented using computer system 3100, various computer-readable mediamight be involved in providing instructions/code to processor(s) 3112for execution and/or might be used to store and/or carry suchinstructions/code. In many implementations, a computer-readable mediumis a physical and/or tangible storage medium. Such a medium can take theform of a non-volatile media or volatile media. Non-volatile mediainclude, for example, optical and/or magnetic disks, such as storagedevice(s) 3114. Volatile media include, without limitation, dynamicmemory, such as working memory 3130.

Common forms of physical and/or tangible computer-readable mediainclude, for example, a floppy disk, a flexible disk, hard disk,magnetic tape, or any other magnetic medium, a CD-ROM, any other opticalmedium, any other physical medium with patterns of holes, a RAM, a PROM,EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any othermedium from which a computer can read instructions and/or code.

Various forms of computer-readable media can be involved in carrying oneor more sequences of one or more instructions to processor(s) 3112 forexecution. Merely by way of example, the instructions can initially becarried on a magnetic disk and/or optical disc of a remote computer. Aremote computer might load the instructions into its dynamic memory andsend the instructions as signals over a transmission medium to bereceived and/or executed by computer system 3100.

Communications subsystem 3120 (and/or components thereof) generally willreceive signals, and bus 3102 then might carry the signals (and/or thedata, instructions, etc. carried by the signals) to working memory 3130,from which processor(s) 3112 retrieves and executes the instructions.The instructions received by working memory 1330 can optionally bestored on non-transitory storage device 3114 either before or afterexecution by processor(s) 3112.

The methods and systems are provided by way of illustration only. Theyshould in no way limit the scope of the present invention.

Having described the invention in detail, it will be apparent thatmodifications, variations, and equivalent embodiments are possiblewithout departing the scope of the invention defined in the appendedclaims. Furthermore, it should be appreciated that all examples in thepresent disclosure are provided as non-limiting examples.

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

EXEMPLIFICATION

The foregoing has been a description of certain non-limiting embodimentsof the invention. Accordingly, it is to be understood that theembodiments of the invention herein described are merely illustrative ofthe application of the principles of the invention. Reference herein todetails of the illustrated embodiments is not intended to limit thescope of the claims.

Example 1 Pseudo-Color Based Barcoding Sample Preparation

FIG. 5 illustrates mRNA transcripts immobilized on a surface throughpoly-A-tail or hydrogel embedment. Once cells are lysed, the cell lysateor purified total RNA can be immobilized on a surface through acoverslips functionalized by DNA or LNA Poly-T by capturing the poly-Atail of mRNA. Alternatively, the mRNA can be mixed with hydrogel such aspolyacrylamide gel and allowed gelation on a coverslips surface whichthe pore size formed would trap the mRNA molecules on the surface.

Design of Primary Probes

FIG. 6 illustrates an exemplary embodiment of gene specific primaryprobes design and sequential barcoding hybridization on mRNA immobilizedon a surface. In particular, FIG. 6a shows that primary probes are notdirectly labeled with fluorophore. The primary probe contains (i) a genespecific targeting region of the mRNA which is 20 nt to 35 nt, (ii) oneor multiple secondary readout probe binding sites, (iii) PCR primerpairs, and optionally (iv) a restriction enzyme cutting sites and (v)spacers nucleotides. Each target mRNA requires at least 20 or moreprimary probes. These primary probes are synthesized from a complexoligo pool which later is amplified by the unique PCR primers to obtaina complete set of probes. The primary probes can be attached to targetsequence either on the surface or in solution (FIG. 6b ). Differentprimary binding sequences are associated with different readoutsequences, which can be selectively turned on during hybridizationrounds.

In this case, mRNA 1 is barcoded with red color and mRNA 2 is barcodedwith blue color in the first round of hybridization. After imaging, asignal extinguishing step is performed, followed by the next round ofbarcoding hybridization. Each mRNA will receive 4 colors in total whichgives their unique identity to differentiate between each other. Thebarcode capacity scales up as F^(N), with F is the number of fluorophoreand N is the number of barcoding hybridization (FIG. 6c ).

Arrangement of Secondary Probe Binding Sites

FIG. 7 illustrates different arrangements of secondary probe bindingsites on primary probes. Assume 24 primary probes are used for eachgene, 4 rounds of barcoding hybridization with 4 different uniquesecondary probe binding sequences, there are various combinations ofarranging the secondary probe binding sequences on the primary probe.Multiple or all unique secondary probe binding sequences can be found inone probe; for example, as overhang sequences connected to the primarybinding sequence in various combinations (FIG. 7a-7c ).

Signal Extinguishing Steps

FIG. 8 depicts possible ways to extinguish fluorescent signals. Forexample either high concentration of formamide can be introduced to‘melt’ or strip off the secondary readout probe (FIG. 8a ).Alternatively, chemical cleavage can be applied (FIG. 8b ); for example,one implementation can be disulfide conjugated dye on secondary readoutprobes can be reduced by TCEP or DTT and other reducing agents to cleaveoff the linker, and thus getting rid the fluorophore from the primaryprobe..

Scaling Up the Number of ‘Fluorophore’ by Serial Hybridization

FIG. 10 depicts a table showing possible barcodes obtained using thepseudo-color barcode scheme. In this example, 3 different color signals,4 serial hybridization rounds and 4 barcoding rounds were used. In thisimplementation, 16 rounds of imaging will be implemented with a total of48 unique secondary readout adapters conjugated to one of the 3fluorophores.

One can adjust F, n, and N to achieve the best strategy to generate adesired barcode capacity. For example, in order to reduce the number ofbarcoding rounds to 3, 30 unique secondary readout sequences are neededat each round of barcoding, as 30³=27,000. This scheme can allow lowerdensity of targets detected per channel in each hybridization round,allowing a larger number of genes to be detected in situ.

Time Efficient Error Correction Scheme

A time efficient error correction scheme can be implemented in thecurrent coding scheme to tolerate 1 drop of ‘color’ in decoding thewhole color codes due to loss of hybridization. It is possible to designa highly efficient correction schemes that are not perfect by obtainingthe tightest packing density for the n-spheres. A barcode generator (i,(i+j+k) mod 5,j,k) can be used to generate barcodes that can tolerate 1round of mis-hybridization. For example, 6 rounds of hybridization with5 color can cover 15625 genes, one can design the barcode scheme to havean extra round of hybridization to still be able to identify the 15625genes even though any one round of hybridizations is lost in thebarcode. More rounds can be corrected with additional error correctionhybridization rounds. For example, barcodes can be designed such thatshortened barcodes are still unique after data for two or more rounds ofhybridizations are lost.

Hybridization Experiment

To show the entire idea works, we targeted 64 genes in mouse NIH3T3cells (FIG. 10). These 64 genes were chosen randomly with a range ofFPKM values from bulk RNA-sequencing data. A set of 24 primary probesfor 1 gene were designed as illustrated in FIG. 2 which contain 35 nt ofgene specific targeting region, 3 unique secondary readout bindingsequences, a pair of PCR primers, and a 1 nucleotide-′T′ spacer. Theprimary probes were synthesized in a complex oligo pool and wereamplified by PCR. The barcodes were generated as 4³=64, withoutimplementing an error correction scheme. 3 fluorophores were used: Cy3b,Alexa594, and Alexa 647 and the number of fluorophores were scaled up asdescribed in FIG. 4 with slight variation. Briefly, in each round ofbarcoding hybridization, 4 unique secondary readout probes were coupledto the 3 fluorophores+one of the fluorophores hybridized in a laterhybridization. In this experiment, the fourth unique secondary readoutprobes in each barcoding hybridization were coupled to Alexa 647,Alexa594, and Cy3B so that only one extra round of imaging is needed toscale up the number of ‘fluorophores’. The length of readout probes usedin this proof-of-concept experiment is 18mers. These secondary readoutprobes were conjugated to dye through an amine-NETS reaction. Thefluorescent signals in each hybridization was extinguished by highconcentration of formamide (60% formamide).

This proof-of-principle shows that the secondary probe readout schemeusing a combination of serial and barcoding steps can be scaled up toperform whole transcriptome profiling. Probes can be designed to targetover 20,000 genes in the transcriptome and can be readout with 4barcoded rounds of hybridization with 12 based “colors”, performed insimilar fashion as this 64-gene experiment.

FIG. 11 depicts an exemplary embodiment of data analysis. Pearson'scorrelation plot between the copy number determined by this technologyvs FPKM values determined from bulk RNA-seq. This shows that themeasurements by the sequential hybridization barcoding experiments arehighly accurate.

The barcoding scheme illustrates in FIGS. 8 and 9 were used formultiplex detection of 1000 different transcription factors mRNAs within vitro RNA SPOTs. First, NIH3T3 cells were grown on a 6-well plateuntil 60% to 80% confluency. Then the cells were lysed and purifiedaccording to Qiagen Rneasy Mini Kit. Then, 100 ng of total RNA in RNABinding Buffer consists of 1M LiCl, 40 mM pH7.5 Tris-HCl, 2 mM EDTA,0.1% Triton X-100, and 20 units of SUPERase Inhibitor (ThermoFischer)was captured on a LNA functionalized coverslips for overnight at roomtemperature. The capturing time and amount of total RNA can be modifiedto adjust the dots density on a coverslip in one frame of view. Next,primary probes (1 nM/probe) which consists of a gene specific bindingregion, 4 barcoded secondary readouts regions, spacers, and primers pairfor a thousand genes were allowed to hybridize to the mRNA target in 30%hybridization buffer made from 10% Dextran Sulfate, 2×SSC, and 30%formamide at 37° C. overnight. Each gene is targeted by a minimum of 24probes. After washing, RNA SPOTs began with the first round ofhybridization by flowing in 10 nM readout probes for each color (3colors in one round of hybridization) and the hybridization was allowedto happen for 30 minutes at room temperature. After a brief wash,anti-bleaching buffer was flowed into the flow cell, and the fluorescentsignals were imaged. Then, the fluorescent signals were extinguished byreduction of 50 mM TCEP in 2×SSC with 0.1% Triton X-100 at roomtemperature for 15 mins. The process was repeated until the last roundof hybridization.

In this implementation, 16 rounds of imaging was implemented with atotal of 48 unique secondary readout adapters conjugated to one of the 3fluorophores. The coding space consists of one round of error correctionfor up to 1728 different targets. In this experiment, a thousanddifferent mRNAs were encoded with the scheme described in FIG. 10. Thetarget mRNAs were chosen from a list of transcription factors (masterregulators of gene expression) that are conserved between mouse andhumans.

FIG. 12 depicts an exemplary a raw image from a barcoding experiment.The raw image shows the results from one of the hybridization round inthe 1000 transcription factor genes experiment. The images of eachserial hybridizations were first aligned to obtain the 12 ‘color’barcodes. Then the dots were decoded based on the error correctionbarcode scheme.

FIG. 13 depicts exemplary results from data analysis, showing decoded1000 different RNA counts with RNA SPOTs on coverslip and correlate withbulk RNA-Seq Fragments per Kilobase Million (FPKM) for NIH3T3 cells. 753genes which have a non-zero value of FPKM and a non-zero value of SPOTsare chosen for this correlation. Note that genes with low FPKM valuesare well detected by RNA SPOTs.

FIG. 14 depicts exemplary results from data analysis, showingcorrelation between RNA counts from different field of views (FOV)s.Each dot is a single gene with its x value the total number oftranscript counted for that gene in FOV 1-5, the y value the counts forFOV 6-10. A high Pearson's correlation coefficient of 0.9131 isobserved. This shows that a few FOV is sufficient to accurately quantifytranscript abundance.

Imaging-Based Translational Profiling, the RiboCounter

Intact ribosomes contain 28S rRNA and 18S rRNA. By probing against theserRNA after the initial RNA SPOTs experiment to decode the mRNA identity,one can estimate the number of ribosomes on each mRNA molecule andinfers the translational state of the mRNA molecules. By combining thismethod and RNA SPOTs, we develop the imaging-based polysome profiling toinfer the translational state of each detected mRNA molecules by probingagainst the 28S and 18S rRNA followed by decoding their identity throughRNA SPOTs.

Experimental Design:

Cells were lysed and captured on LNA coverslips. Then, by usingfluorescence probes either with direct 20mer smFISH or using primaryprobes, followed by secondary readouts to target the 18S and/or 28S rRNAbefore RNA SPOTs, one can obtain the intensity for the ribosomes on eachmRNA molecules. The identity of each RNA molecules were decoded by RNASPOTs as described above.

Results: FIG. 15 depicts exemplary results from translational profilinganalysis showing the 28S rRNA median intensity extracted from each mRNAmolecules of the thousand RNA species. The 28S rRNA intensity for allthe 1000 transcription factor genes were all obtained.

FIG. 16 depicts exemplary results from translational profiling analysis,showing the distribution of 28S rRNA median intensity of the gene Myh9on each mRNA transcripts. The total number of Myh9 transcripts detectedare 1169.

RNA SPOTs of the transcriptome To implement RNA SPOTs at thetranscriptome level, probes were designed targeting the coding regionsof 10,212 mRNAs with 28 to 32 probes each gene for a total of 323,156probes. Probes were stringently designed to avoid off-target and crosshybridization (Supplementary information). The primary probes directlyhybridize to the mRNAs captured on a Locked Nucleic Acid (LNA) poly(dT)functionalized coverslip and contains a set of overhang sequences thatspecifies the barcode unique to each transcript (FIG. 4A). A 12“pseudo-color” based scheme was used such that 4 rounds of barcoding aresufficient to cover the transcriptome (12⁴=20,736), with an additionalround of error correction to compensate for one drop in any round ofbarcoding [Shah et al, 2016].

This design minimizes the barcode length to avoid errors from using longbarcodes. The 12 base “pseudo-colors” in each round of barcoding isencoded by a set of 12 readout oligos. Three of the readout oligos werehybridized at a time, imaged in the Cy3b, Alexa 594 and Alexa 647fluorescence channels, and repeated 4 times to iterate through all 12readout sequences. After each round of hybridization and imaging, thefluorophores are removed by disulfide cleavage and followed by the nextround of hybridization. A total of 60 readout oligos in 20 rounds ofhybridization were used to decode the 10,212 genes targeted. Every fourrounds of hybridization were collapsed onto a single image with 12pseudo-colors. The barcodes were determined from aligning the 12-colorimages. A common sequence is present in all primary probes and targetedby an oligo labeled with Alexa488 to serve as an alignment markerthrough all 20 rounds of hybridization. The switching andrehybridization time is fast, with the overall speed limited by imagingspeed. Typically, 100 fields of view containing 106 mRNAs can be imagedwith 20 rounds of hybridization in a 14-hour period through an automatedfluidics system.

To determine the accuracy of the transcriptome level measurements, wecompare the decoded RNA SPOTs data with RNAseq data in fibroblasts andmESCs, and found that they correlated with R=0.86 and R=0.9respectively. In addition, RNA SPOTs correlated with the gold standardsmFISH quantitation with a correlation of R=0.86 in mESCs (23 genes)[Singer 2014] and R=0.88 in fibroblasts (7 genes). Between tworeplicates of RNA SPOTs in fibroblasts, the results agree with R=0.94,indicating that RNA SPOTs is a highly robust and reproduciblemeasurement method. Lastly, comparing genes that were differentiallyexpressed in fibroblasts versus mESCs, we observed the same trend asthose detected by RNAseq. For example, pluripotency factors such asRex1, Esrrb and Sox2 are highly expressed in mESCs but not expressed infibroblasts as determined by RNA SPOTs (FIG. 19), similar to thedifferences observed by RNAseq.

Several genes were observed to be outliers in the RNA SPOTs to RNAseqcomparison (FIG. 19). When we examined the genes in cells directly bysingle molecule FISH, we observed that their expression levels matchedthose determined with RNA SPOTs. This indicates that eitherhybridization efficiency on these genes were low or they wereover-detected by RNAseq.

An advantage of RNA SPOTs is that specific sets of genes can be profiledselectively simply by designing probe sets targeting those genes. Inthis fashion, ribosomal RNA and highly expressed housekeeping genes canbe avoided simply by eliminating those probes from the gene set. As eachdot detected in our assay corresponds to a single mRNA, RNA SPOTs ismore efficient in term of imaging compared to RNAseq, where manysequencing reads are needed to determine the abundance of a transcript.

RNA SPOTs can be scaled down to single cell in combination withmicrofluidics tools to trap and lyse cells [Bose 2015] or withsplit-pool molecular indexing methods [Cao 2017, Rosenberg 2017]. Withtargeted RNA SPOTs, we can choose to probe only for the 2000transcription factors [Fulton 2009] or 1000 landmark informative genes[Donner 2012, Duan 2016] in single cells, instead of profiling thetranscriptome, to capture the essential information in cells and toincrease the number of cells sampled.

RNA SPOTs can enable many additional methods beyond expression profilingto study RNA variants [Levesque 2013], modifications [Mellis 2017], RNAbinding proteins [Buenrostro 2014] and profile ribosomes occupancy[Ingolia 2009] directly on the captured RNA. Lastly, noncoding RNAs andother RNAs without polyA tails can be captured in hydrogels rather thanwith dT oligos. As cost of sequencing is a major limiting factor ingenomics experiments, SPOTs enable an accurate and low-cost alternativeto sequencing with many further applications beyond RNA to DNA andproteins.

Example 2 Readout Probes and Rehybridization in Mouse Embryonic StemCells Synthesis of DNA Probes-Disulfide-Dye Conjugates

An exemplary scheme for synthesizing readout probes-dye conjugatesconnected by a disulfide bond. Thiol-modified DNA probes were orderedfrom Integrated DNA Technologies in their oxidized form. 10 nmoles ofthiol-modified DNA probes was treated with 10 mM TCEP at 37° C. for 30minutes. After reduction step and gel column purified, the DNA probeswere mixed with 50 equivalents of 3-(2-Pyridyldithio) propionic acidN-hydroxysuccinimide ester (SPDP) linker in 1×PBS solution containing 10mM EDTA. The mixture was allowed to react at room temperature for atleast 2 hours. Immediately after the reaction, the mixture was spincolumn purified and was re-suspended in 60 uL of 1×PBS containing 100 ugof cadaverine dyes. The reaction was allowed to proceed at roomtemperature for at least 4 hours before subjected toethanol-precipitation purification and HPLC purified. The concentrationof the final product was determined using Nanodrop.

Rehybridization in Mouse Embryonic Stem Cells (mESCs).

FIG. 28 illustrates rehybridization reactions in mouse embryonic stemcells (mESCs). To verify the scheme works in both cells and oncoverslips, we perform a proof of concept experiments in mouse embryonicstem cells (mESCs) targeting introns of pgk1 gene. During first round ofhybridization, tertiary readout probes conjugated with A647 were appliedto identify specific target sequences in mRNAs via secondary bridgebinding sites. Real fluorescent spots are shown in red dashed box (28A).As a control, fluorescence presence in channel 594 was also measuredduring the first hybridization: no fluorescent spots were observed(28B). After TCEP treatment and washing steps, channel 647 was reimaged.The observed dim dots are non-specific binding of dyes which does notinterfere with subsequent real fluorescent spots identification (28C).During a second round of hybridization, unique readout probes bearing594 dye were used to hybridize with the secondary bridge binding sitesto give real fluorescent spots (28D). These spots confirmed thepositions observed previous when readout probes with A647 were used(28A).

In Vitro Rehybridization with Readout Probes

Transcribed polyA-tailed dCAS9-EGFP mRNA were captured on a dT20 LockedNucleic Acid (LNA) surface-modified coverslips to show therehybridization scheme works in vitro on the coverslips. Once again,specific binding was observed. When tertiary probes conjugated with A647were used: specific signals were observed in channel 647 while therewere few signals observed in channel 594 (FIG. 29, top row). After TCEPtreatment and wash of the first set of tertiary probes, a second set oftertiary probes conjugated with 594 dye were applied during the setround of hybridization reaction. Consequently, specific signals wereobserved in channel 594 while there were few signals observed in channel647 (FIG. 29, bottom row).

Technically, any heterobifunctional cross-linking reagent that canconnect between the dye and thiol-modified DNA probes will work for thisrehybridization scheme. DNA probes-disulfide-dye conjugates weresynthesized using 3-(2-pyridyldithio) propionyl hydrazide (PDPH) linkerand NHS ester dyes which work equally well as former conjugates.

Example 3 Brain Slice Analysis

As an illustration, barcodes generated using the error correctionmechanisms disclosed herein are used for in situ transcription profilingof single cells reveals spatial organization of cells in the mousehippocampus.

Identifying the spatial organization of tissues at cellular resolutionfrom single cell gene expression profiles is essential to understandingmany biological systems. In particular, there exist conflicting evidenceon whether the hippocampus is organized into transcriptionally distinctsubregions. Here, a generalizable in situ 3D multiplexed imaging methodwas applied to quantify hundreds of genes with single cell resolutionvia Sequential barcoded Fluorescence in situ hybridization (seqFISH)(Lubeck et al., 2014). seqFISH was used to identify uniquetranscriptional states by quantifying and clustering up to 249 genes in16,958 cells. By visualizing these clustered cells in situ, weidentified distinct layers in the dentate gyrus corresponding to thegranule cell layer, composed of predominantly a single cell class, andthe subgranular zone, which contains cells involved in adultneurogenesis. Furthermore, it was discovered that distinct subregionswithin the CA1 and CA3 are composed of unique combinations of cells indifferent transcriptional states, instead of a single state in eachsub-region as previously proposed. In addition, it was revealed thatwhile the dorsal region of the CA1 is relatively homogenous at thesingle cell level, the ventral part of the CA1 has a high degree ofcellular heterogeneity. These structures and patterns are observed insections from different mice, as well as in seqFISH experiments withdifferent sets of genes. Together, these results demonstrate the powerof seqFISH in transcriptional profiling of complex tissues.

The mouse brain contains about 10⁸ cells arranged into distinctanatomical structures. While cells in these complex structures have beentraditionally classified by morphology and electrophysiology, theircharacterization has been recently aided by gene expression studies. Inparticular, the Allen Brain Atlas (ABA) provides a systematic geneexpression database using in situ hybridization (ISH) of the entiremouse brain one gene at a time (Dong et al., 2009; Fanselow and Dong,2010; Thompson et al., 2008). This comprehensive reference providesregional gene expression information, but lacks the ability to correlatethe expression of different genes in the same cell. More recently,single cell RNA sequencing (RNA-seq) has identified many cell typesbased on gene expression profiles (Darmanis et al., 2015; Tasic et al.,2016; Zeisel et al., 2015). However, while single cell RNA-seq providesuseful information on multiple genes in individual cells, it hasrelatively low detection efficiencies and requires cells to be removedfrom their native environment resulting in the loss of spatialinformation. These different approaches can lead to contradictorydescriptions of cellular organization in the brain and other biologicalsystems.

In the hippocampus, recent RNA-seq data suggests that CA1 is composed ofcells with a continuum of expression states (Cembrowski et al., 2016,Zeisel et al 2015), while ABA analysis indicates that sub-regions withinthe CA1 have distinct expression profiles (Thompson et al, 2008). Toresolve the two conflicting descriptions of hippocampal organization, amethod to profile transcription in situ in the hippocampus with singlecell resolution is needed. Here, we demonstrate a general method thatenables the mapping of cells and their transcription profiles withsingle molecule resolution in tissue, allowing an unprecedentedresolution of cellular transcription states for molecular neuroscience(FIG. 32A).

A great deal of progress has been made recently in developing highlyquantitative methods to profile the transcriptome of single cells.Building upon single molecule fluorescence in situ hybridization(smFISH) (Femino et al., 1998; Raj et al., 2006;), Lubeck et al. deviseda general method to highly multiplex single molecule in situ mRNAimaging irrespective of transcript density using super-resolutionmicroscopy (Betzig et al., 2006; Rust et al., 2006; Lubeck and Cai,2012). However, the spectral barcoding methods used in these previousworks is difficult to scale up beyond 20-30 genes because of limitednumber of fluorophores (Fan et al., 2001; Lubeck and Cai, 2012).

To overcome the scalability problem, a temporal barcoding scheme wasdeveloped that uses a limited set of fluorophores and scalesexponentially with time (Lubeck et al., 2014). Specifically, by usingsequential rounds of probe hybridizations on the mRNAs in fixed cells toimpart a unique pre-defined temporal sequence of colors, different mRNAscan be uniquely identified in situ. The multiplex capacity scales asF^(N), where F is the number of fluorophores and N is the number ofrounds of hybridization. Thus, one can increase the multiplex capacityby increasing the number of rounds of hybridization with a limited poolof fluorophores. This approach is called Sequential barcodedFluorescence in situ Hybridization (seqFISH) (Lubeck et al., 2014). Inparallel, in situ sequencing methods were developed to directly sequencetranscripts in tissue sections, but these methods suffer from lowdetection efficiency (<1%) (Ke et al., 2013; Lee et al., 2014).Recently, Chen et al. expanded the error correction method in theoriginal seqFISH demonstration by using a Hamming distance 2 based errorcorrecting barcode system, called merFISH. However, this implementationrequires larger transcripts (>6 kb) and many more rounds ofhybridization than the method described here (Chen et al., 2015b).Furthermore, seqFISH and its variants have only been applied in cellculture systems due to the difficulty of smFISH detection in tissue.Here, an improved version of seqFISH in complex tissues by includingsignal amplification and a time-efficient error correction scheme (FIGS.32A-D, FIG. 40) were demonstrated to resolve the structural organizationof the hippocampus with single cell resolution.

Example 4 Brain Slice Analysis with Error Correction

Signal Amplification and Error Correction Enable Robust Detection ofmRNAs in Tissues.

To overcome the autofluorescence and scattering inherent to braintissues, we used an amplified version of smFISH, called single moleculeHybridization Chain Reaction (smHCR) (FIG. 32E) (Shah et al., 2016).Single molecule HCR amplified signal 22.1±11.5 (mean±s.d., n=1288, FIG.38B) fold compared to smFISH, enabling robust and rapid detection ofindividual mRNA molecules in tissues and facile alignment of spotsbetween hybridizations (FIG. 38A). Single transcripts can be detectedand localized in 3D with just 24 probes in tissues, enabling detectionof transcripts <1 kb in size, with a fidelity comparable to the smFISHgold standard (FIGS. 41C-D) but with signals 20-fold brighter (Shah etal., 2016). Single molecule HCR DNA polymers can also be digested byDNAse and re-hybridized in brain slices, allowing HCR-seqFISH to berobustly implemented (FIG. 33A). We note the smHCR enables true 3Dimaging in tissues, whereas the previous sequential FISH demonstrations(Lubeck et al., 2014, Chen et al., 2015) were performed only in flatcell cultures.

Furthermore, we improved upon our existing barcode system byimplementing a time-efficient error correction scheme. The major sourceof error in seqFISH is the loss of signal due to mis-hybridization,which increases with the numbers of hybridization. We introduced anextra round of hybridization to correct loss of signal during any roundof hybridization (FIG. 32D). By minimizing the number of hybridizations,this error correction scheme is efficient in practical implementation.For example, using 5 fluorophores and 4 rounds (instead of 3 rounds) ofhybridization to code for 125 genes, we can still uniquely assignbarcodes to genes even when signal from any single round ofhybridization is missing. Although merFISH can tolerate 2 errors in thebarcodes, it requires 16 rounds of hybridization to code 140 genes (Chenet al. 2015). As increasing the number of hybridizations can potentiallylead to more experimental error and analysis complexity, our simpleerror correction method corrects for the most common error, droppedsignal. Also, the fewer rounds of hybridizations decreases the totalimaging time, which is rate-limiting for tissue experiments. HCR-seqFISHwith simpler error-correction scheme allows efficient and accuratequantification of transcription profiles in tissues.

Using this HCR-seqFISH method, we surveyed the regional and sub-regionaltranscriptional heterogeneity within the temporal and parietal cortexand hippocampus of the mouse brain by imaging similar coronal sectionscollected from 3 different animals. Two similar sections from separatemice were profiled with probes for 125 genes, while one additional brainslice was imaged for 249 genes. In each of the coronal slices, between60-80 fields of view were imaged, each 216 μm×216 μm×15 μm, in thecortex and hippocampus (FIG. 32A and FIG. 41E). For the 125 gene set, 56of the genes (FIG. 32D, FIG. 40) were selected because they showedspatially heterogeneous expression based on the ABA (Lein et al., 2007),another 44 were selected from a list of transcription factors, and 25marker genes were selected from single cell RNA-seq datasets (Zeisel etal., 2015). One hundred of these genes were barcoded by 4 rounds ofhybridization (FIG. 32B). The remaining 25 high abundance genes weremeasured individually using 5-color smHCR in 5 serial rounds ofhybridizations (FIG. 32C). This hybrid approach of measuring mediumexpression genes with barcoding seqFISH and high copy number genesserially in subsequent hybridizations allows a large dynamic range oftranscripts to be profiled in the same cell.

seqFISH is an Accurate and Efficient Method to Multiplex RNA In Situ.

To determine the accuracy of the seqFISH method in quantifying mRNAlevels in single cells in tissue, we compared the copy number of 5 ofthe 100 target genes measured by barcoding to the copy number found bycolocalized smHCR detection in the same cell (FIG. 33B, FIG. 42A) in 15μm brain sections. We found that the copy number of the RNAs per cell asmeasured by barcoding and smHCR agreed with an R-value of 0.85 and aslope of 0.84 (N=3851). As colocalized smHCR matches smFISH transcriptquantitation (Shah et al., 2016), the barcoded seqFISH method canquantify mRNA molecules in single cells with 84% efficiency compared tothe gold standard of smFISH. In comparison, single cell RNA-seqmeasurements are 5-20% efficient based on spike-in controls and in situsequencing is less than 1% efficient (Darmanis et al., 2015; Klein etal., 2015; Lee et al., 2014; Macosko et al., 2015; Tasic et al., 2016;Zeisel et al., 2015; Stahl et al., 2016). This high efficiency ofdetection results from a low transcript drop rate and a high barcoderecovery rate due to the error correction round of hybridization. In ourexperiment, 78.9% of barcodes (N=2,115,477 barcodes) were found in all 4hybridization rounds and 21.1% were identified in 3 out of the 4hybridizations (FIG. 33C), indicating that the probability of detectinga given mRNA molecule is 94% in each round of hybridization (FIG. 42B).

To quantify the amount of false positive signal due to misalignment ofbarcodes and nonspecific binding of probes, the amount of off-targetbarcodes that were detected was measured. With four rounds ofhybridizations and 5 fluorophores, there were 5⁴=625 unique codes. 100of these barcodes were assigned to measure mRNAs detected at 914.8±570.5counts per cell (mean±s.d., N=3439). In comparison, the 525 remainingoff-target barcodes that were not used were detected at 4.6±4.7(mean±s.d., N=3439) counts per cell (FIG. 33D). False positives, due tochance alignment of nonspecifically bound spots, contributed minimallyto the barcode readouts because of this three order of magnitudedifference in detected barcodes (on target vs. off target). The falsepositives we observe fall only on barcodes hamming distance one awayfrom on-target barcodes, yet minimally contribute to undercountingon-target barcodes (FIG. 33E). Furthermore, even the most frequentoff-target barcode was observed 65.57 times less frequently than themost infrequent mRNA coding barcode (FIG. 33E, FIG. 42). Even thoughduring each round of hybridization, 24.8±0.4% (mean±s.e., N=4 rounds ofhybridization) of the spots were nonspecifically bound probes, barcodemiss-assignments did not occur frequently because non-specifically boundprobes do not reappear in the same location after digestion with DNAseand re-hybridization (FIG. 33A). Together the quantifications of falsepositive and false negative barcodes demonstrate that this method ishighly efficient and accurate at detecting RNAs in situ in single cellswithin tissues.

Cell Clusters are Based on Combinatorial Expression Profiles.

We imaged the expression of 125 genes in coronal sections from two micefor a total of 14,908 cells (FIG. 41E). Cortical and hippocampal cellswere segmented based on DAPI and Nissl staining. A tessellationalgorithm was developed to accurately segment densely packed cells inthe hippocampus. To avoid capturing mRNA from neighboring cells, wecontracted by 10% the borders of cells determined by the segmentationalgorithm.

To group the single cell data into distinct transcriptional states, weZ-score normalized the copy number of each transcript in every cell(FIG. 34A) and hierarchically clustered the cells to identify cells withsimilar expression patterns (FIG. 43). Many of these clusters, based onoverall expression patterns, contain clear transcriptional markers ofknown cell types previously identified by single cell RNA-seq (FIG. 34B)(Zeisel et al., 2015, Tasic et al 2016). Cell clusters 12 and 13contained clear expression of Gja1 which marks out astrocytes (Zeisel etal., 2015, Tasic et al 2016). Cluster 12 also expresses Mfge8 whilecluster 13 did not, indicating two distinct population of astrocytes(FIG. 34B). There are further subclusters within each of the astrocytepopulations with different spatial localization patterns (FIG. 43C).Cluster 11 cells expressed Laptm5, a known microglia marker (Zeisel etal., 2015, Tasic et al 2016). Cluster 3 expressed interneuron geneswhile cluster 1-2 and 4-5 expressed genes associated with pyramidalneurons (Zeisel et al., 2015, Tasic et al 2016). Some clusters containedmany distinct subclusters, such as Amigo2 enriched Mural cells (cluster9.4) or 0 mg expressing oligodendrocytes (cluster 10.4 and 10.5). Themajor clusters were robust to downsampling the number of cells used inclustering (FIG. 44), with some of the hippocampal pyramidal and gliaclusters robustly defined even with 400 cells. Similarly, principalcomponent analysis (PCA) visualization of the data (FIG. 43F)recapitulated the major clusters that corresponded to astrocyte,microglia, cortical pyramidal, hippocampal pyramidal, dentate gyrus (DG)granule, and interneuron cells.

Cell Clusters Show Distinct Regional Localization

Many neuronal clusters mapped to distinct regions in the brain (FIG.34B). Several classes of pyramidal cells (cluster 1-2) showed exclusivelocalization to the hippocampus, while other classes (4-5) showedpredominantly cortical localization. There were also a class of cells(cluster 7) that were almost exclusively present in the DG.Interestingly, these clusters segregated based solely on gene expressionprofiles without adding any spatial information into the clusteringalgorithm. These differences in transcriptional states of neurons couldbe due to intrinsic differences in the cells or due to different localenvironment and activity patterns.

In contrast, astrocyte, microglia and other non-neuronal cell clusterswere generally uniformly present in all areas of the brain (FIG. 34B).However, subclusters of astrocytes did localize to different regions ofthe brain preferentially (FIG. 43C), with subcluster 12.3 localizedpreferentially to the cortex, while 12.1 subcluster was uniformlydistributed. Similarly, cluster 9 cells contain subclusters (9.3, 9.5and 9.6) that localize exclusively to the DG, while other subcluster(9.1) localize almost exclusively to the cortex. The regionallocalization of neurons are especially pronounced with cluster 1 and 2localized almost exclusively to the hippocampus, with some of thesubclusters localized predominantly to the CA3. Furthermore, whilepyramidal cell clusters 4 and 5 are preferentially cortically localized,the few hippocampal cells in these clusters form their own subclusters(4.4 and 5.4) (FIG. 43C). In cluster 6 cells, many subclusters withdistinct expression profiles are localized almost exclusively in theCA1, CA3 or the DG (FIGS. 34C, 43C). In contrast, cluster 7 cells show arelatively homogenous regionalization pattern, but further subdividebased on combinatorial expression patterns (FIG. 34D). Subclusters ofcluster 9 also show significant regionalization where subclusters 9.1,9.3, 9.5, and 9.6 show localization to the SGZ (FIG. 43C). Overall, cellclusters with similar expression profiles exhibited similar spatiallocalizations across the brain with a correlation coefficient of 0.67(FIG. 43E), indicating the existence of archetypal regional expressionpatterns and potential spatial markers in the brain. These results showthat the tissue-optimized HCR seqFISH approach can directly identify avariety of transcriptional states and quantify broad spatial patterns ofexpression.

Combinatorial Expression Patterns Define Fine Clusters.

While certain cell clusters contain strong expression of marker genes,not all clusters are defined based on a few genes. How much power doindividual genes or groups of genes have in explaining the observed cellclusters? To understand this, we examined whether subsets of genes canrecapitulate the observed clusters (FIG. 34E). We found that any set of25 genes recovers about half of the correlation structure in thecell-to-cell correlation map (FIGS. 34E, 45B-C, and 44, N=10 bootstrapreplicates). The fact that the selection of any 25 genes can explain thegross patterns in the data is likely due to the high correlationsamongst the expression patterns of genes, as shown in the gene-to-genecorrelation map (FIG. 45A). Thus, a small subset of the measured genescan provides sufficient information to infer the gross transcriptionalstates of the cells. Interestingly, this may be the same reason whylow-coverage single cell sequencing methods such as drop-seq and inDrop(Klein et al., 2015; Macosko et al., 2015) can capture the largedistinction of cell types, because many highly expressed genes arecorrelated to other genes that collectively define cell types.

At the same time, the finer correlation structure in the data, requiredto define the cell clusters accurately, can only be captured with moregenes (FIGS. 34F, 45B-C). Consistent with this, using a “random-forest”machine learning algorithm (Breiman, 2001) to classify cell clusters, wefound that 75 genes are needed to classify cells with 50% accuracy,indicating that correct cluster assignment requires more detailedinformation from many genes (FIG. 34E). Supporting this view, the first10 principal components (PC) explained 59.5% of the variation in thedata, while the rest of the variation required the remaining 115 PCs(FIGS. 34F, 43D). The “random forest” algorithm required 10 PCs topredict the cell cluster assignments with 50% accuracy (FIG. 34F), butaccuracy steadily increased with more PCs. These observation indicatedtwo levels of information in the data: a coarse level, where largedistinctions in cell clusters are observable by a few genes, and a finelevel, where subtle distinctions require many more genes.

These results suggest two points experimentally. First, multiplexing atthe level of 20 genes by seqFISH can give broad cell clusteridentification that is not available with 2-3 gene smFISH experiments.Although single marker genes are useful for inference, we find that theyfrequently are not sufficient for cell classification. For example, allDG specific granule cells (clusters 7) have Gpc4 and Vps13c as theirenriched marker genes (FIG. 34B); yet, Gpc4 and Vps13c are also stronglyexpressed in other hippocampal cells outside of the DG, as seen in bothour experiments and the ABA. Thus, smFISH against Gpc4 and Vps13c alonewould not be sufficient to uniquely identify the DG granule cells.Furthermore, even the strongly bimodal markers that are known to definecell types (i.e. Mgfe8, Gja1, etc.) are correlated enough to overallexpression profiles that cells fall into the appropriate cluster evenwhen these genes are excluded. This point suggests that while markergenes can be essential in assigning a cell to a known cell type, theyare not necessary to identify unique clusters in the dataset providedenough measurements are made. Second, accurate measurement ofcombinatorial expression of many genes enabled by seqFISH can allow formore specific cell cluster identification. As a comparison, in singlecell RNAseq data, CA1 pyramidal cells are clustered into a singlecluster (Zeisel et. al, 2015; Habib et. al 2016) potentially because ofthe relatively lower detection efficiency of the method. In our seqFISHexperiments, measuring hundreds of genes quantitatively, we can resolveseveral clusters and subclusters with robust regionalization within theCA1 (FIGS. 34B, 43C).

Cells are Patterned in the Dentate Gyrus.

To further visualize the spatial organization of cells, we mappedcluster definitions of cells back into the images. In the DG, weobserved a striking lamina layering of cell classes. The two blades ofthe DG (FIGS. 35A-B) showed mirror arrangements of cells, with cluster 9cells, forming the subgranular zone (SGZ), leading into a granule celllayer (GCL) dominated by a single cluster of granule cells (cluster 7)(FIG. 34B). In the 125 gene data set, the cells of the GCL were found tobe dominated by expression of Gpc4 and Vps13c matching ISH data from theABA (FIG. 48B). Cluster 7 was found to be further subdivided into 6subclusters (FIG. 40C). These subclusters were found to have varyinglevels of calbindin D-28K (Calb1) expression which is known to increasewith granule cell maturation (FIG. 34D) (Yang et al., 2015). On theother hand, the cells of the SGZ were found to be significantly enrichedin astrocyte markers such as Mfge8 and Mertk, which has been also beenobserved previously (Miller et al, 2013) and in the ABA data. However,these cells do not cluster with typical astrocytes (cluster 12 and 13)because their combinatorial expression patterns are different fromastrocytes, consistent with their classification as a completelydifferent population of cells.

In the fork region of the DG, the layer of cluster 9 cells appeared onthe interior surface of the fork, followed by a layer of granule cells(cluster 7) (FIG. 35C). A different layering pattern is seen at thecrest of the DG, where astrocytes, microglia, and some other glial cellsline the exterior of the crest ensheathing the GCL (FIG. 35D). In bothbrains of the 125 gene experiments, the same cell clusters and spatialarrangements are observed. Furthermore, because the mRNAs are imaged in3D in the 10-15 um brain slices, we can obtain a 3D view of theexpression profiles, shown in the fork regions of the DG (FIG. 35F).

Distinct Regions of CA1 and CA3 are Composed of Different Combination ofCell Clusters.

While each region of the DG contains similar compositions of cells,distinct subregions within the CA1 and CA3 contained differentcombinations of cell classes (FIGS. 36, 46F). In the CA1, there were 3distinct regions defined by their individual cellular compositions. Inthe dorsal region of CA1 (CA1d), neuron cluster 6 (enriched in Nell1, aprotein kinase C binding protein) was the major cell type in thepyramidal layer, with astrocyte, microglia and other cells (clusters10-13) intercalating into the stratum pyramidale (SP) (FIGS. 36A-C).Transitioning into the CA1 intermediate region (CA1i) (FIG. 36D),pyramidal cell cluster 4 displaced cell cluster 6 as the dominant cell,with the co-appearance of cluster 1 and 2 pyramidal cells.

As the middle of the CA1i region was reached, a small amount of cluster4 pyramidal cells remain, while cluster 1 and 2 pyramidal cells dominate(FIGS. 36E-F). Cluster 1 and 2 are enriched in Nell1 (EGF like protein),Npy2r (neuropeptide Y receptor), Slc4a8 (sodium bicarbonate transporter)and B3gat2 (glucuronosyltransferase). The CA1i region displayed acharacteristic spatial organization where glial cells line the outermostregions, while pyramidal cell cluster 1 and 2 longitudinally partitionedthe pyramidal layer. This separation of the inner versus the outerlayers of CA1 matches those observed in previously (Dong et al., 2008).Furthermore, interneurons (cluster 3) were found to preferentially linethe inner edge of the pyramidal layer in the CA1i region (FIGS. 36E-F).This patterning of interneurons, particularly subcluster 3.1 cells whichwere enriched in Slc5a7, a choline transporter, was consistent with thepatterning of cholinergic interneurons observed with ChAT-GFP labeling(Yi et al., 2015). Finally, the largest amount of heterogeneity in theCA1 was seen in the ventral CA1 region (CA1v), where cell clusters 3, 5,and 10 began to mix in with clusters 1 and 2 (FIGS. 36G-I).

Similarly, the CA3 was found to have four transcriptionally distinctregions with different pyramidal cell compositions and abrupttransitions. The ventral most region of CA3 contained a high level ofheterogeneity of pyramidal cell clusters (FIGS. 36J-K), while theintermediate region of CA3 contain a mixture of cell clusters 1 and 2(FIGS. 36L-M). As the CA3 progressed towards the hilus of the DG, thecell types transitioned first to primarily cluster 4 neurons (enrichedin dcx, doublecortin, and ColSal, a collagen), and then to almostexclusively cluster 6 neurons in the region most proximal to the DGhilus (FIGS. 36O-P). It is interesting to note that while cluster 6cells appear in both the CA1 (subcluster 6.8) and CA3 (subclusters 6.1and 6.4), sub-clusters of 6 show distant regional localization (FIG.43C), suggesting that the gene expression differences in CA1 and CA3cells are captured in the seqFISH data.

The regionalized expression patterns we observed in the hippocampusmatch closely to those observed in previous literature (Thompson et alNeuron 2008 and Dong et al PNAS 2009). For example, CA1d, CA1i, CA1vboundaries correspond to the boundaries shown in FIG. 2B in Dong et al.In CA3, the subregions observed in our experiment match the CA3subregion 4-7 in Thompson et al. (Thompson et al., 2008).

Lastly, we note that the two slices from two different mice in the 125gene experiment show not only the same subregional structure (FIGS.35-37), but also the same clusters of cells (FIGS. 36 and 37) in thedifferent subregions of the hippocampus (FIG. 46). In both brains, theCA1d consists of relatively homogenous population of cluster 6 cells,which transition to a mixture of 1 and 2 cells in CA1i, and finally to amixture of 1-6 and 10 cells in the CA1v (FIG. 46F). These resultstogether show that the sub-regions of the hippocampus are a robustfeature in the organization of CA1 and CA3, consisting of cells classeswith distinct expression profiles. The stereotypical nature of thespatial arrangement of these structures suggest further experiments withseqFISH and other functional assays to probe the distinct functions ofthe different cell clusters in the CA1 and CA3.

249 Gene Multiplex Experiments Show the Same Hippocampal Subregions

To further show that the sub-regional structure of the hippocampus isindependent of the target genes, we performed a 249 gene seqFISHexperiment on a third coronal section. Of these 249 genes, only 22 genesoverlapped with the 125 gene experiment set. For this set of genes, 214were selected from a list of transcription factors and signaling pathwaycomponents and the remaining 35 were selected from cell identity markersfrom another single cell RNAseq dataset (Tasic et al, 2016). The 214genes were barcoded by 5 rounds of hybridization, while the remaininggenes were imaged in 7 rounds of non-barcoding serial hybridization. Toquantify the efficiency of this experiment, 4 genes in the barcoding set(Smarca4, Sin3a, Npas3, and Neurod4) were re-probed with smHCR. Thebarcoding efficiency of the 249 gene probe set was found to be 71% withand R value of 0.80 (FIG. 46D). In single cells, we detect on average2807±1660 (mean±s.d., N=2050 cells) total barcoded barcodes.

The same arrangement in the DG was observed in the 249 gene experiment,despite different genes used, indicating robust identification of thelayering in the DG by seqFISH (FIGS. 38S-T). In particular, the cells inthe SGZ are clustered independently from cells in the GCL, similar tothe layers observed in the 125 gene experiment. In the SGZ cells, weobserved enrichment of Sox11, a key transcription factor in neurogenesis(Miller et al, 2013). Other transcription factors involved inneurogenesis, NFIA and Tbr1 are also enriched in the SGZ cells as seenin our data and the ABA images (FIG. 48A). The observations of thisdistinct layer in both the 249 and 125 gene experiment and the combinedgene enrichment pattern (increased Sox11, Sox9, NFIA, and Tbr1 in the249 gene experiment and increased Mertk and Mfge8 in the 125 geneexperiment) suggests that many cells in this layer are involved in adultneurogenesis in the SGZ.

In addition, the same regionalized cellular patterns are observed inCA1d, CA1i, and CA1v, where different subregions utilize different cellclasses in characteristic ratios (FIG. 46F). As seen with the 125 geneexperiment, while the CA1d uses only a few cell classes and isrelatively homogeneous, while the CA1v region is made up of manydifferent cell classes resulting in a high level of cellularheterogeneity. Furthermore, the distinction between CA1 and CA3 cellclusters are more clear in the 249 gene experiment suggesting moreresolving power of spatial patterns (FIGS. 38A-K). The 249 geneexperiment also suggests that the CA3 may be composed of 3-4 subregionsbased on cell cluster composition (FIGS. 38L-R). The cellularheterogeneity of the CA3 is again shown to mirror that of the CA1, wherethe cellular heterogeneity increases along the dorsal to ventral axis.Cells with distinctive marker gene expression in the hippocampus areshown in supplementary FIG. 38A.

Single Cell Data Resolves Cellular Organizations in the Sub-Regions ofthe CA1 and CA3.

Two conflicting views of the cell types in the hippocampus have beenproposed based on the analysis of the Allen Brain Atlas data (Thompson2008) as well as recent RNA-seq data (Cembrowski et al., 2016, Zeisel etal 2015). Analysis of the ABA in situ data showed that distinctsubregions of the hippocampus expressed different molecular markers,indicating that the CA1 and CA3 are “regionalized” into distinctsub-structures (Fanselow and Dong, 2010; Thompson et al., 2008).However, recent bulk RNA-seq experiments on the CA1 found that geneexpression patterns changed gradually along the dorsal to ventral axis,contradicting the sharp boundaries observed in the ABA analysis(Cembrowski et al., 2016). Further supporting this “continuous” celltype view of the hippocampus, analysis of the single cell RNA-seq data(Zeisel et al, 2015) identified a single continuous population of cellsin the CA1 region.

Our data provides a single cell resolution picture of the spatialorganization of cells in the hippocampus and reconciles both the RNA-seqand the ABA data. While our data mostly supports a regionalized view ofthe hippocampus, we observe that a single cell class does not in generaldefine CA1 and CA3 sub-regions. Instead, we observed that differentsubregions of CA1 and CA3 are composed of distinct combinations of cellclusters (FIGS. 33-35). For example, CA1d consists primarily of cluster6 pyramidal cells (FIGS. 36A-C), in addition to the cluster 1, 2, 10,and 12 cells, while CA1v consists of a large set of cell classesincluding cluster 1-6 and 10 cells, but at different relative abundances(FIGS. 36-37, FIGS. 46F-G). Due to this intermixing of cell classes ineach sub-region, a bulk measurement of transcription profiles would finda lack of regionalization, but single cell analysis with spatialresolution would identify these distinct regions based on their uniquecell class compositions. Indeed, when we averaged the single cellexpression profile within each sub-region of the CA1, we can reproducethe continuous correlation profiles found by bulk RNA-seq between CA1v,CA1i, and CA1d (FIG. 39) (Cembrowski et al., 2016). The bulk RNA-seqobservation that CA1i lacked specific marker genes can also beexplained. This is in fact consistent with our findings that CA1icontained cell classes present in both CA1d and CA1v (FIGS. 36-38).

This organization of cell classes is observed in both the 125 geneexperiments as well as in the 249 gene experiment. It is worth notingthat the complexity of cell populations observed in the CA1d versus theCA1v matches the functional differences in CA1. CA1d is responsible forspatial learning and navigation and contains a higher concentration ofplace cells and send projections to dorsal subiculum and corticalretrosplenial area (Cenquizca and Swanson, 2007; Jung et al., 1994;Risold et al, 1997; O'Keefe and Dostrovsky, 1971). We observed that CA1dis composed of a relatively homogeneous population of cells,predominantly of cluster 6 cells. In contrast, the ventral region isinvolved in a variety of cognitive tasks, such as stress response,emotional and social behavior (Cenquizca and Swanson, 2007; Jung et al.,1994; Fanselow and Dong, 2010; Kishi et al., 2006; Muller et al., 1996;Petrovich et al., 2001; Pitkänen et al., 2000; Saunders et al., 1988;Witter and Amaral, 1991; Yi et al., 2015). Correspondingly, we observeda large set of cell classes in the CA1v regions. It is intriguing tohypothesize that the different cell classes identified based onmolecular profiles may correspond to neurons with distinct connectivityand functional patterns. This hypothesis can be investigated in futureexperiments combining anterograde tracing as well aselectrophysiological recording followed by seqFISH.

A list of the 249 genes being analyzed can be found in the followingTable 3.

Name of Genes being analyzed Tal1 Dmbx1 Emx2 Uncx Paxip1 Ctnnb1 Prdm1Rybp Nfkb2 Tfdp2 Grhl1 Sp8 Irf2 Zfp287 Esr2 Zfp128 Vav1 Sp1 Ppargc1b Sp7Pin1 Nfya Vsx1 Klf1 Vsx2 Mybl1 Mybl2 Rnf2 Blzf1 Topors Nr3c2 Nfia Taf6lNr4a3 Hoxd12 Hoxd13 Ttf1 Sox9 Nr2e1 Polr2b Hltf Sox6 Pbx3 Sox5 Foxa1Cdc5l Cebpg Ciita Rest Ets1 Mafk Tbx15 Scml2 Myb Clock Rbpj Foxc1 Zfp422Pias3 Runx1 Ppara Relb Vdr Cdc6 Arid3a Lhx1 Hoxb8 Hoxb9 Hic1 Lhx6 Six4Hoxb3 Zfp263 Cbfa2t3 Ehf Nhlh1 Gata6 Gata4 Gata5 Lpp Nfe2l3 Nfe2l2 Tmf1Gli1 Tbx2 En1 En2 Hnf1a Tbx4 Zfp423 Elf1 Foxb1 Elf2 Elf4 Mxd1 Wt1 Rfx4Bhlhe41 Sox13 Taf4b Rfx2 Sox17 Ahr Sall4 Med14 Smyd1 Sall3 Arid2 Zfp64Pgr Trps1 Hoxa1 Bach2 Bach1 Notch3 Pknox1 Pknox2 Sin3a Etv3 Smad9 Smad5Alx1 Egf Mn1 Nkx3-1 Rbak Gabpa Nfkbiz Zscan21 Trp73 E2f7 Esrrg RbpjlNfatc4 Nr5a1 Neurod4 Esrrb Tbx21 Rorc Mitf Pax7 Pax6 Pax1 Pax3 Pax2 Pax9Zkscan17 Gfi1 Mzf1 Runx3 Smarca4 Foxd4 Foxd3 Creb1 Srebf1 Sox11 Gmeb2Irx4 Pou3f2 Ikzf1 Tcf23 Mtf2 Npas3 Nfatc3 Nfil3 Phox2b Plag1 E2f2 Ddx3xTaf2 Pou4f1 Trim33 Tsc2 Lmx1a Nr2f2 Eomes Wwtr1 Foxo1 Ar Zfp354a Elk4Foxo4 Sall1 Mycn Maml3 Foxp3 Atm Uaca Tbr1 Pm1 Lhx3 Atr Zbtb33 Ptch1Lhx5 Barhl1 Irx5 Tfap2b Tfap2e Rxra Rxrb Gli2 Gli3 Zic4 Zic5 Zic2 Zic3Satb1 Onecut2 Foxn4 Mnat1 Foxn1 Dlx2 Vezf1 sncg sst th vip xdh slc17a8slc5a7 slc6a3 slc6a8 smad3 opalin pdgfra palvb reln slc17a7 lyve mfge8mog myl14 ndnf ctss foxj1 gad1 htr3a igtP acta2 alldh111 camk2 chatcldn5 ngef tiam1 slc1a2 gja1 fbl11

seqFISH Provides a Generalized Method to Multiplex mRNA Imaging inTissues

seqFISH with amplification and error correction provides a highlyquantitative method to profile hundreds of mRNA species directly insingle cells within their native anatomical context. Our method ofstripping the probes from the RNA has many advantages. DNAse digestionof probes allows false positives to be rejected as nonspecifically boundprobes do not colocalize between different rounds of hybridization (FIG.33A). In addition, the same region of the transcript can be hybridizedin every round, allowing seqFISH to efficiently target mRNAs shorterthan 1 kb, enabling targeting of most genes. Lastly, seqFISH allowsexponential scaling of barcode numbers, thus 4-5 rounds of hybridizationcan code for hundreds of transcripts with a simple error correctionscheme. Theoretically, the entire transcriptome can be coded for witherror correction by using 8-9 rounds of hybridization with seqFISH.These advantages of HCR seqFISH allows robust multiplexed RNA detectionin tissues, shown here in the mouse brain.

Ultimately, the multiplexing capability of seqFISH is limited by theamount of optical space within a cell, and not by the coding capacity ofthe method (supplementary text). We showed previously thatsuper-resolution microscopy can significantly increase the optical spaceavailable in the cell for transcription profile imaging, butsuper-resolution microscopy experiments proved difficult to image insamples thicker than 1 μm, and were experimentally cumbersome and timeconsuming to image (Lubeck and Cai, 2012). A recent development inexpansion microscopy as well as correlation methods (Coskun et al.,2016) however offers promise for multiplexing to levels of hightranscript density (Chen et al., 2015a; Treweek et al., 2015, Chen etal., 2016). In addition, by labeling subcellular components (i.e.,dendrites and axons) with antibodies, the local transcriptome incompartments of the cell can be measured.

It was observed that, because expression patterns amongst genes arehighly correlated, the distinction between large classes of cells can bedetermined from 10-20 genes, while a finer classification of cellclusters depends on the quantitative measurement of the combinatorialexpression patterns of many genes (FIGS. 34E and F). This correlationamongst genes can be used to “stitch” our seqFISH data with single cellRNAseq data, similar to the approach explored with single cell RNAseqand ISH in Satija et al (Satija et al., 2015). By correlating seqFISHdata to single cell RNA-seq expression data, cells types identifiedbased on RNA-seq can be “mapped” back into our seqFISH data.

As shown here, seqFISH with hundreds of genes in tissues can become ageneral and widely used tool to answer a wide range of fundamentalquestions in biology and medicine. For neuroscience, by combining theinsights into the spatial organization of transcription provided byseqFISH with connectomics and electrophysiological measurements, we canobtain a comprehensive understanding of the molecular basis of theneuroanatomy of the brain.

Example 5 Supplementary Experimental Procedure for Brain Slide Analysis

Probe Design.

Genes were selected from the Allen Brain Atlas database. We identifiedgenes that are heterogeneously expressed in coronal sections containingthe hippocampus at Bregma coordinates −2.68 mm anterior. Using the ABAregion definitions, we break down the voxels representing the ABA datain those brain sections into 160 distinct regions and average theexpression values within each region. We selected 100 genes that hadhigh variances across these distinct regions and that also hadlow-medium expression levels. These genes included transcription factorsand signaling pathways components as well as ion channels and otherfunctional genes. Lastly, we chose 25 genes from single cell RNA-seqdata that were enriched in certain cell types. Briefly, the designcriteria used were 1) constant regions of all spliced isoforms wereidentified, 2) Masked regions of UCSC genome were removed from possibleprobe design, 3) 35mer sequences were tiled 4 nt apart, 4) sets ofnon-overlapping probes with tightest GC range around 55% were found, 5)probes were blasted for off-target hits. Any probe with an expectedtotal off-target copy number of more than 5000 was dropped. Once allpossible probes for every target gene was acquired, the probe setoligo-pool was optimized using the following criteria: 1) Expected # ofoff-target hits for entire probe pool was calculated, 2) probes weresequentially dropped from genes until any off-target gene was hit by nomore than 6 probes from entire pool, 3) HCR adapters were added todesigned probes and 10 nt in either direction of the adapter junctionwas blasted and screened for off-target hits, 4) probe pools weresearched for regions of 18mer complementary, 5) the probe sets for agiven transcript was refined down to 24 probes by dropping probes inorder of the expected number of off-target hits, 6) Cutting sites andhybridization specific primers were added to probes.

Probe Generation.

All oligoarray pools were purchased as 92 k synthesis from CustomarrayInc. Probes were amplified from array-synthesized oligo pool), with thefollowing modifications: (i) a 35 nt RNA-targeting sequence for in situhybridization, (ii) a 35 nt HCR initiator sequence designed to initiateone color of 5 possible HCR polymers, (iii) two hybridization specificflanking primer sequences to allow PCR amplification of the probe setand (iv) EcoRI (5′-GAATTC-3′) and KpnI (5′-GGTACC-3′) sites for cuttingout flanking primers to reduce probe size. Ethanol precipitation wasused to purify the final digested probes.

Brain Extraction and Sample Mounting.

C57BL/6 with Ai6 Cre-reporter (uncrossed) (Jackson Labs, SN: 007906)female mice aged 50-80 days were anesthetized with isoflurane accordingto institute protocols (protocol #1701-14). No randomization of mice wasused and blinding was not necessary as the study was exploratory. Micewere perfused for 8 minutes with perfusion buffer (10 U/ml heparin, 0.5%NaNO₂ (w/v) in 0.1M PBS at 4° C.). Mice were then perfused with fresh 4%PFA\0.1M PBS buffer at 4 C for 8 minutes. The mouse brain was dissectedout of the skull and immediately placed in a 4% PFA buffer for 2 hoursat room temperature under gentle mixing. The brain was then immersed in4 C 30% RNAse-free Sucrose (Amresco 0335-2.5 KG)\lx PBS until the brainsank. After the brain sank, the brain was frozen in an dryice\isopropanol bath in OCT media and stored at −80 C. Fifteen micronsections were cut using a cryotome and immediately placed on anaminosilane modified coverslip.

Sample Permeabilization, Hybridization, and Imaging.

Brain sections mounted to coverslips were permeabilized in 4 C 70% EtOHfor 12-18 hours. Brains were further permeabilized by the addition ofrnase-free 8% SDS (Ambion AM9822) for 10 minutes. Samples were rinsed toremove SDS, desiccated and a hybridization chamber (Grace Bio-Labs621505) was adhered around the brain section. Samples were hybridizedovernight at 37 C with Split Color PGK1 Probes in Hybridization Buffer(2×SSC (Invitrogen 15557-036), 10% Formaldehyde (v/v) (Ambion AM9344),10% Dextran Sulfate (Sigma D8906), 2 mM Vanadyl Ribonucleoside Complex(VRC; NEB 514025) in Ultrapure water (Invitrogen 10977-015)). Sampleswere washed in 30% Wash Buffer (WBT: 2×SSC, 30% Formaldehyde (v/v)] 10%Dextran Sulfate, 0.1% Triton-X 100 (Sigma X-100), 2 mM VRC in Ultrapurewater) for 30 minutes. While washing aliquoted HCR hairpins (MolecularInstruments Inc) were heated to 95 C for 1.5 minutes and allowed to coolto RT for 30 minutes. HCR hairpins were diluted to a concentration of120 nM per hairpin in amplification buffer (2×SSC, 10% Dextran Sulfate)and added to washed tissue for 45 minutes. Following amplification,samples were washed in the same 30% WBT for at least 10 minutes toremove excess hairpins. Samples were stained with DAPI and submerged inpyranose oxidase antibleaching buffer. Sample port covers were closedwith a glass coverslip or a transparent polycarbonate sheet to excludeoxygen.

Samples were imaged using a standard epifluorescence microscope (NikonTi Eclipse with custom built laser assembly) for the 125-geneexperiment. Exposures times were 200 ms for cy7 and alexa 488 channelsand 100 ms for alexa 647, alexa 594, and cy3b channels. For the 249-geneexperiment, a Yokogawa CSU-W1 spinning disk confocal unit attached to anOlympus IX-81 base was used for imaging. The exposure times were 500 msfor each channel. At this stage, intact and accessible mRNA shouldalways appear in two channels. If the RNA was deemed to be intact, DAPIdata was collected in this hybridization. Samples were digested withDNAse I (Roche 04716728001) for 4 hours at room temperature on thescope. Following DNAse I the sample was washed several times with 30%WBT and hybridized overnight with 70% Formamide HB and the experimentprobes at 1 nM concentration per probe sequence at room temperature.Samples were again washed and amplified as before. Barcode digits weredeveloped by repeating this cycle with the appropriate probes for eachhybridization. Fluorescent Nissl stain (ThermoFisher N-21480) wascollected at the end of the experiment along with images ofmultispectral beads to aid chromatic aberration corrections.

Image Processing.

To remove the effects of chromatic aberration, the multispectral beadswere first used to create geometric transforms to align all fluorescencechannels. Next, the background illumination profile of everyfluorescence channel was mapped using a morphological image opening witha large structuring element. These illumination profile maps were usedto flatten the illumination in post-processing resulting in relativelyuniform background intensity and preservation of the intensity profileof fluorescent points. The background signal was then subtracted usingthe imagej rolling ball background subtraction algorithm with a radiusof 3 pixels. Finally, the calculated geometric transforms were appliedto each channel respectively. The 150 pixel border region around theimage was ignored in all analysis to avoid errors from edge effects ofillumination.

Image Registration.

The processed images were then registered by first taking a maximumintensity projection along the z direction in each channel. All of themaximum projections of the channels of a single hybridization were thencollapsed resulting in 4 composite images containing all the points in aparticular round of hybridization. Each of these composite images ofhybridization 1-3 were then cross-correlated individually with thecomposite image of hybridization 4 and the position of the maxima of thecross-correlation was used as the translation factor to alignhybridizations 1-3 to hybridization 4.

Cell Segmentation.

For cells in the cortex, the cells were segmented manually using theDAPI images taken in the first round of hybridization and thefluorescent nissl stain taken at the end of the experiment. Furthermore,the density of the point cloud surrounding a cell was taken into accountwhen forming cell boundaries, especially in cells that did not stainwith the nissl stain. For the hippocampus, the cells were segmented byfirst manually selecting the centroid in 3D of each DAPI signal of everycell. Transcripts were first assigned based on nearest centroids. Thesepoint clouds were then used to refine the centroid estimate and create a3D voronoi tessellation with a 10% boundary-shrinking factor toeliminate ambiguous mRNA assignments from neighboring cells.

Barcode Calling.

The potential mRNA signals were then found by LOG filtering theregistered images and finding points of local maxima above a specifiedthreshold value. Once all potential points in all channels of allhybridizations were obtained, dots were matched to potential barcodepartners in all other channels of all other hybridizations using a 1pixel search radius to find symmetric nearest neighbors. Pointcombinations that constructed only a single barcode were immediatelymatched to the on-target barcode set. For points that matched toconstruct multiple barcodes, first the point sets were filtered bycalculating the residual spatial distance of each potential barcodepoint set and only the point sets giving the minimum residuals were usedto match to a barcode. If multiple barcodes were still possible, thepoint was matched to its closest on-target barcode with a hammingdistance of 1. If multiple on target barcodes were still possible, thenthe point was dropped from the analysis as an ambiguous barcode. Thisprocedure was repeated using each hybridization as a seed for barcodefinding and only barcodes that were called similarly in at least 3 outof 4 rounds were used in the analysis. The number of each barcode wasthen counted in each of the assigned cell volumes and transcript numberswere assigned based on the number of on-target barcodes present in thecell volume. All image processing and image analysis code can beobtained upon request.

Clustering.

To cluster the dataset with 14,908 cells and 125 genes profiled, wefirst z-score normalized the data based on gene expression. Once thesingle cell gene expression data is converted into z-scores, we computea matrix of cell-to-cell correlations using Pearson correlationcoefficients. Then hierarchical clustering with Ward linkage isperformed on the cell-to-cell correlation data with cells in the centerfield of view. The cluster definitions are then propagated to theremaining cells using a random forest machine learning algorithm. Toanalyze the robustness of individual clusters, a random forest model wastrained using varying subsets of the data and used to predict thecluster assignment of the remaining cells. A bootstrap analysis bydropping different sets of cells was performed in increments (FIG. 42).To determine the effect of dropping out genes on the accuracy of theclustering analysis, we used a random forest decision tree to learn thecluster definition based on the 125 gene data. Then we ask the decisiontree to re-compute the cluster assignment on cell-to-cell correlationmatrices with fewer and fewer genes (FIG. 34F, green line). Bootstrapresampling was also performed with this analysis (FIG. 34F, bluelines).The PCA and tSNE analysis were performed using the same cell-to-cellz-scored Pearson correlation matrix. The cell-to-cell correlation inFIG. 34E was calculated with increasing number of principal componentsdropped (have their eigenvalues set to zero). The cluster assignmentaccuracy is again computed through the random forest decision tree.

Optical Space for Barcodes in Cells.

The theoretical upper limit for the number of barcodes that can beidentified accurately within a cells primarily depends on the volume ofthe cell. As mRNA spots are diffraction limited, if a microscope isconfigured to have sub-diffraction limited pixel size, the ability toidentify smFISH signal without any super-resolution would require no twomRNA signals to be immediately adjacent to each other in x, y or zdimension. These minimum required voxels are called “coding voxels.” Theabsolute upper limit of the number of transcripts that can be codedunambiguously without any super-resolution methods is solely a functionof the number of coding voxels present in a cell. Assuming a diffractionlimit of λ um and a resolution of z um in the z direction, there exists

$\frac{V}{\left( {3\lambda} \right)^{2}z}$

coding voxels per cell, where V is the volume of the cell in microns. Inthe seqFISH method, we use 5 or more channels to hold mRNA spots whichwould increase the total number of coding voxels by a multiplicativefactor equal to the number of channels used for barcoding. Therefore,

${\# B} = \frac{FV}{\left( {3\lambda} \right)^{2}z}$

where #B is the maximum number of unambiguous barcodes a cell can hold,and F is the number of channels used. As mammalian cells range fromabout 500-4000 microns in volume, these cells can accommodate roughlybetween 6100-49,000 barcodes assuming 5 fluorescence channels are beingused, the diffraction limit is 0.3 um, and the z resolution is 0.5 um.In principle, this calculation would provide the total number ofperfectly discernible spots a cell can accommodate. In our actualexperimental data, we have some amount of dropped barcodes due toambiguity in barcode assignment due to spot overlaps. This is one of themain factors that reduces the efficiency of seqFISH as compared tosingle transcript detection (i.e., smFISH or smHCR). Expansionmicroscopy could further increase the number of coding voxels in a cellby the expansion factor leading to fewer drops and imaging of densertranscripts.

Additional background information can be found in the followingreferences, each of which is hereby incorporated by reference in itsentirety.

-   Beliveau, B. J., Joyce, E. F., Apostolopoulos, N., Yilmaz, F.,    Fonseka, C. Y., McCole, R. B., Chang, Y., Li, J. B., Senaratne, T.    N., Williams, B. R., et al. (2012). Versatile design and synthesis    platform for visualizing genomes with Oligopaint FISH probes. Proc.    Natl. Acad. Sci. U.S.A 109, 21301-21306.-   Betzig, E., Patterson, G. H., Sougrat, R., Lindwasser, O. W.,    Olenych, S., Bonifacino, J. S., Davidson, M. W.,    Lippincott-Schwartz, J., and Hess, H. F. (2006). Imaging    Intracellular Fluorescent Proteins at Nanometer Resolution. Science    313, 1642-1645.-   Breiman, L. (2001). Random Forests. Mach. Learn. 45, 5-32.-   Bose S, Wan Z, Carr A, Rizvi A H, Vieira G, Pe'er D, Sims P A.    Scalable microfluidics for single-cell RNA printing and sequencing.    Genome Biol. 2015 Jun. 6; 16:120.-   Buenrostro J D, Araya C L, Chircus L M, Layton C J, Chang H Y,    Snyder M P, Greenleaf W J. Quantitative analysis of RNA-protein    interactions on a massively parallel array reveals biophysical and    evolutionary landscapes. Nat Biotechnol. 2014 June; 32(6):562-8.-   Cajigas, L J., Tushev, G., Will, T. J., Dieck, S. tom, Fuerst, N.,    and Schuman, E. M. (2012). The Local Transcriptome in the Synaptic    Neuropil Revealed by Deep Sequencing and High-Resolution Imaging.    Neuron 74, 453-466.-   Junyue Cao, Jonathan S. Packer, Vijay Ramani, Darren A. Cusanovich,    Chau Huynh, Riza Daza, Xiaojie Qiu, Choli Lee, Scott N. Furlan,    Frank J. Steemers, Andrew Adey, Robert H. Waterston, Cole Trapnell,    Jay Shendure. Comprehensive single cell transcriptional profiling of    a multicellular organism by combinatorial indexing. bioRxiv 104844;    doi: https://doi.org/10.1101/104844-   Cembrowski, M. S., Bachman, J. L., Wang, L., Sugino, K., Shields, B.    C., and Spruston, N. (2016). Spatial Gene-Expression Gradients    Underlie Prominent Heterogeneity of CA1 Pyramidal Neurons. Neuron    89, 351-368.-   Cenquizca, L. A., and Swanson, L. W. (2007). Spatial organization of    direct hippocampal field CA1 axonal projections to the rest of the    cerebral cortex. Brain Res. Rev. 56, 1-26.-   Chen, F., Tillberg, P. W., and Boyden, E. S. (2015a). Expansion    microscopy. Science 347, 543-548.-   Chen, F., Wassie, A. T., Cote, A. J., Sinha, A., Alon, S., Asano,    S., Daugharthy, E. R., Chang, J.-B., Marblestone, A., Church, G. M.,    Raj, A., Boyden, E. S., 2016. Nanoscale imaging of RNA with    expansion microscopy. Nat Meth advance online publication.-   Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S., and    Zhuang, X. (2015b). Spatially resolved, highly multiplexed RNA    profiling in single cells. Science 348, aaa6090.-   Choi, H. M. T., Beck, V. A., and Pierce, N. A. (2014).    Next-Generation in Situ Hybridization Chain Reaction: Higher Gain,    Lower Cost, Greater Durability. ACS Nano 8, 4284-4294.-   Darmanis, S., Sloan, S. A., Zhang, Y., Enge, M., Caneda, C.,    Shuer, L. M., Gephart, M. G. H., Barres, B. A., and Quake, S. R.    (2015). A survey of human brain transcriptome diversity at the    single cell level. Proc. Natl. Acad. Sci. 112, 7285-7290.-   Dong, H.-W., Swanson, L. W., Chen, L., Fanselow, M. S., and    Toga, A. W. (2009). Genomic-anatomic evidence for distinct    functional domains in hippocampal field CA1. Proc. Natl. Acad. Sci.    106, 11794-11799.-   Donner Y, Feng T, Benoist C, Koller D. Imputing gene expression from    selectively reduced probe sets. Nat Methods. 2012 November;    9(11):1120-5.-   Duan et al. L1000CDS2: LINCS L1000 characteristic direction    signatures search engine. npj Systems Biology and    Applications (2016) 2, 16015;-   Fan, Y., Braut, S. A., Lin, Q., Singer, R. H., and Skoultchi, A. I.    (2001). Determination of transgenic loci by expression FISH.    Genomics 71, 66-69.-   Fanselow, M. S., and Dong, H.-W. (2010). Are the dorsal and ventral    hippocampus functionally distinct structures? Neuron 65, 7-19.-   Femino, A. M., Fay, F. S., Fogarty, K., and Singer, R. H. (1998).    Visualization of Single RNA Transcripts in Situ. Science 280,    585-590.-   Fulton D L, Sundararajan S, Badis G, Hughes T R, Wasserman W W,    Roach J C, Sladek R. TFCat: the curated catalog of mouse and human    transcription factors. Genome Biol. 2009; 10(3):R29. doi: 10.    1186/gb-2009-10-3-r29.-   Habib, N., Li, Y., Heidenreich, M., Swiech, L., Trombetta, J. J.,    Zhang, F., Regev, A., 2016. Div-Seq: A single nucleus RNA-Seq method    reveals dynamics of rare adult newborn neurons in the CNS. bioRxiv    045989.-   Ingolia N T, Ghaemmaghami S, Newman J R, Weissman J S. Genome-wide    analysis in vivo of translation with nucleotide resolution using    ribosome profiling. Science. 2009 Apr. 10; 324(5924):218-23.-   Jung, M. W., Wiener, S. I., and McNaughton, B. L. (1994). Comparison    of spatial firing characteristics of units in dorsal and ventral    hippocampus of the rat. J. Neurosci. 14, 7347-7356.-   Ke, R., Mignardi, M., Pacureanu, A., Svedlund, J., Botling, J.,    Wahlby, C., and Nilsson, M. (2013). In situ sequencing for RNA    analysis in preserved tissue and cells. Nat. Methods 10, 857-860.-   Kishi, T., Tsumori, T., Yokota, S., and Yasui, Y. (2006).    Topographical projection from the hippocampal formation to the    amygdala: A combined anterograde and retrograde tracing study in the    rat. J. Comp. Neurol. 496, 349-368.-   Klein, A. M., Mazutis, L., Akartuna, I., Tallapragada, N., Veres,    A., Li, V., Peshkin, L., Weitz, D. A., and Kirschner, M. W. (2015).    Droplet Barcoding for Single-Cell Transcriptomics Applied to    Embryonic Stem Cells. Cell 161, 1187-1201.-   Lee, J. H., Daugharthy, E. R., Scheiman, J., Kalhor, R., Yang, J.    L., Ferrante, T. C., Terry, R., Jeanty, S. S. F., Li, C., Amamoto,    R., et al. (2014). Highly Multiplexed Subcellular RNA Sequencing in    Situ. Science 343, 1360-1363.-   Lein, E. S., Hawrylycz, M. J., Ao, N., Ayres, M., Bensinger, A.,    Bernard, A., Boe, A. F., Boguski, M. S., Brockway, K. S., Byrnes, E.    J., et al. (2007). Genome-wide atlas of gene expression in the adult    mouse brain. Nature 445, 168-176.-   Levesque M J, Ginart P, Wei Y, Raj A. Visualizing SNVs to quantify    allele-specific expression in single cells. Nat Methods. 2013    September; 10(9):865-867.-   Lubeck, E., and Cai, L. (2012). Single-cell systems biology by    super-resolution imaging and combinatorial labeling. Nat. Methods 9,    743-748.-   Lubeck, E., Coskun, A. F., Zhiyentayev, T., Ahmad, M., and Cai, L.    (2014). Single-cell in situ RNA profiling by sequential    hybridization. Nat. Methods 11, 360-361.-   Macosko, E. Z., Basu, A., Satija, R., Nemesh, J., Shekhar, K.,    Goldman, M., Tirosh, I., Bialas, A. R., Kamitaki, N.,    Martersteck, E. M., et al. (2015). Highly Parallel Genome-wide    Expression Profiling of Individual Cells Using Nanoliter Droplets.    Cell 161, 1202-1214.-   Madisen, L., Zwingman, T. A., Sunkin, S. M., Oh, S. W., Zariwala, H.    A., Gu, H., Ng, L. L., Palmiter, R. D., Hawrylycz, M. J., Jones, A.    R., et al. (2010). A robust and high-throughput Cre reporting and    characterization system for the whole mouse brain. Nat. Neurosci.    13, 133-140.-   Madisen, L., Mao, T., Koch, H., Zhuo, J., Berenyi, A., Fujisawa, S.,    Hsu, Y.-W. A., Iii, A. J. G., Gu, X., Zanella, S., et al. (2012). A    toolbox of Cre-dependent optogenetic transgenic mice for    light-induced activation and silencing. Nat. Neurosci. 15, 793-802.-   Mellis I A, Gupte R, Raj A, Rouhanifard S H. Visualizing    adenosine-to-inosine RNA editing in single mammalian cells. Nat    Methods. 2017 Jun. 12. doi: 10.1038/nmeth.4332.-   Miller, J A. Jason Nathanson, Daniel Franjic, Sungbo Shim, Rachel A.    Dalley, Sheila Shapouri, Kimberly A. Smith, Susan M. Sunkin, Amy    Bernard, Jeffrey L. Bennett, Chang-Kyu Lee, Michael J. Hawrylycz,    Allan R. Jones, David G. Amaral, Nenad Sestan, Fred H. Gage, Ed S.    Lein (2013). Conserved molecular signatures of neurogenesis in the    hippocampal subgranular zone of rodents and primates. Development.    140(22): 4633-4644.-   Mortazavi A, Williams B A, McCue K, Schaeffer L, Wold B. Mapping and    quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008    July; 5(7):621-628.-   Muller, R., Stead, M., and Pach, J. (1996). The hippocampus as a    cognitive graph. J. Gen. Physiol. 107, 663-694.-   Nagalakshmi, U. et al. The transcriptional landscape of the yeast    genome defined by RNA sequencing. Science 320, 1344-1349 (2008).-   O'Keefe, J., and Dostrovsky, J. (1971). The hippocampus as a spatial    map. Preliminary evidence from unit activity in the freely-moving    rat. Brain Res. 34, 171-175.-   Petrovich, G. D., Canteras, N. S., and Swanson, L. W. (2001).    Combinatorial amygdalar inputs to hippocampal domains and    hypothalamic behavior systems. Brain Res. Brain Res. Rev. 38,    247-289.-   Pitkänen, A., Pikkarainen, M., Nurminen, N., and Ylinen, A. (2000).    Reciprocal Connections between the Amygdala and the Hippocampal    Formation, Perirhinal Cortex, and Postrhinal Cortex in Rat: A    Review. Ann. N. Y. Acad. Sci. 911, 369-391.-   Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y., and Tyagi, S.    (2006). Stochastic mRNA Synthesis in Mammalian Cells. PLoS Biot 4,    e309.-   Risold, P. Y., and Swanson, L. W. (1996). Structural evidence for    functional domains in the rat hippocampus. Science 272, 1484-1486.-   Alexander B Rosenberg, Charles Roco, Richard A Muscat, Anna Kuchina,    Sumit Mukherjee, Wei Chen, David J Peeler, Zizhen Yao, Bosiljka    Tasic, Drew L Sellers, Suzie H Pun, Georg Seelig. Scaling single    cell transcriptomics through split pool barcoding bioRxiv 105163;    doi: https://doi.org/10.1101/105163-   Rust, M. J., Bates, M., and Zhuang, X. (2006). Sub-diffraction-limit    imaging by stochastic optical reconstruction microscopy (STORM). Nat    Meth 3, 793-796.-   Satija, R., Farrell, J. A., Gennert, D., Schier, A. F., Regev,    A., 2015. Spatial reconstruction of single-cell gene expression    data. Nat Biotech 33, 495-502.-   Saunders, R. C., Rosene, D. L., and Van Hoesen, G. W. (1988).    Comparison of the efferents of the amygdala and the hippocampal    formation in the rhesus monkey: II. Reciprocal and non-reciprocal    connections. J. Comp. Neurol. 271, 185-207.-   S. Shah, E. Lubeck, W. Zhou, L. Cai, In Situ Transcription Profiling    of Single Cells Reveals Spatial Organization of Cells in the Mouse    Hippocampus. Neuron 92, 342-357 (2016).-   Singer Z S, Yong J, Tischler J, Hackett J A, Altinok A, Surani M A,    Cai L, Elowitz M B. Dynamic heterogeneity and DNA methylation in    embryonic stem cells. Mol Cell. 2014 Jul. 17; 55(2):319-31.-   Shah, S., Lubeck, E., Schwarzkopf, M., He, T., Greenbaum, A.,    Sohn, C. ho, Lignell, A., Choi, H. M. T., Gradinaru, V., Pierce, N.    A., Cai, L., 2016. Single-molecule RNA detection at depth via    hybridization chain reaction and tissue hydrogel embedding and    clearing. Development dev.138560. doi:10.1242/dev.138560-   Ståhl, P. L., Salmén, F., Vickovic, S., Lundmark, A., Navarro, J.    F., Magnusson, J., Giacomello, S., Asp, M., Westholm, J. O., Huss,    M., Mollbrink, A., Linnarsson, S., Codeluppi, S., Borg, Å., Pontén,    F., Costea, P. I., Sahlen, P., Mulder, J., Bergmann, O., Lundeberg,    J., Frisén, J., 2016. Visualization and analysis of gene expression    in tissue sections by spatial transcriptomics. Science 353, 78-82.    doi:10.1126/science.aaf2403-   Tasic, B., Menon, V., Nguyen, T. N., Kim, T. K., Jarsky, T., Yao,    Z., Levi, B., Gray, L. T., Sorensen, S. A., Dolbeare, T., et al.    (2016). Adult mouse cortical cell taxonomy revealed by single cell    transcriptomics. Nat. Neurosci. advance online publication.-   Thompson, C. L., Pathak, S. D., Jeromin, A., Ng, L. L.,    MacPherson, C. R., Mortrud, M. T., Cusick, A., Riley, Z. L.,    Sunkin, S. M., Bernard, A., et al. (2008). Genomic Anatomy of the    Hippocampus. Neuron 60, 1010-1021.-   Treweek, J. B., Chan, K. Y., Flytzanis, N. C., Yang, B.,    Deverman, B. E., Greenbaum, A., Lignell, A., Xiao, C., Cai, L.,    Ladinsky, M. S., et al. (2015). Whole-body tissue stabilization and    selective extractions via tissue-hydrogel hybrids for    high-resolution intact circuit mapping and phenotyping. Nat. Protoc.    10, 1860-1896.-   Van der Maaten, L., and Hinton, G. (2008). Visualizing data using    t-SNE. J. Mach. Learn. Res. 9, 85.-   Witter, M. P. (1993). Organization of the entorhinal-hippocampal    system: A review of current anatomical data. Hippocampus 3, 28-44.-   Witter, M. P., and Amaral, D. G. (1991). Entorhinal cortex of the    monkey: V. Projections to the dentate gyrus, hippocampus, and    subicular complex. J. Comp. Neurol. 307, 437-459.-   Yang, B., Treweek, J. B., Kulkarni, R. P., Deverman, B. E., Chen,    C.-K., Lubeck, E., Shah, S., Cai, L., and Gradinaru, V. (2014).    Single-Cell Phenotyping within Transparent Intact Tissue through    Whole-Body Clearing. Cell.-   Yang S M, Alvarez D D, Schinder A F. (2015). Reliable Genetic    Labeling of Adult-Born Dentate Granule Cells Using Ascl1 CreERT2 and    Glast CreERT2 Murine Lines. J Neurosci. 35(46):15379-90.-   Yi, F., Catudio-Garrett, E., Gabriel, R., Wilhelm, M., Erdelyi, F.,    Szabo, G., Deisseroth, K., and Lawrence, J. (2015). Hippocampal    “cholinergic interneurons” visualized with the choline    acetyltransferase promoter: anatomical distribution, intrinsic    membrane properties, neurochemical characteristics, and capacity for    cholinergic modulation. Front. Synaptic Neurosci. 7.-   Zeisel, A., Manchado, A. B. M., Codeluppi, S., Lönnerberg, P.,    Manno, G. L., Juréus, A., Marques, S., Munguba, H., He, L.,    Betsholtz, C., et al. (2015). Cell types in the mouse cortex and    hippocampus revealed by single-cell RNA-seq. Science aaa1934.

EQUIVALENTS

Having described some illustrative embodiments of the invention, itshould be apparent to those skilled in the art that the foregoing ismerely illustrative and not limiting, having been presented by way ofexample only. Numerous modifications and other illustrative embodimentsare within the scope of one of ordinary skill in the art and arecontemplated as falling within the scope of the invention. Inparticular, although many of the examples presented herein involvespecific combinations of method acts or system elements, it should beunderstood that those acts and those elements may be combined in otherways to accomplish the same objectives. Acts, elements, and featuresdiscussed only in connection with one embodiment are not intended to beexcluded from a similar role in other embodiments. Further, for the oneor more means-plus-function limitations recited in the following claims,the means are not intended to be limited to the means disclosed hereinfor performing the recited function, but are intended to cover in scopeany means, known now or later developed, for performing the recitedfunction.

Use of ordinal terms such as “first”, “second”, “third”, etc., in theclaims to modify a claim element does not by itself connote anypriority, precedence, or order of one claim element over another or thetemporal order in which acts of a method are performed, but are usedmerely as labels to distinguish one claim element having a certain namefrom another element having a same name (but for use of the ordinalterm) to distinguish the claim elements. Similarly, use of a), b), etc.,or i), ii), etc. does not by itself connote any priority, precedence, ororder of steps in the claims. Similarly, the use of these terms in thespecification does not by itself connote any required priority,precedence, or order.

The foregoing written specification is considered to be sufficient toenable one skilled in the art to practice the invention. The presentinvention is not to be limited in scope by examples provided, since theexamples are intended as a single illustration of one aspect of theinvention and other functionally equivalent embodiments are within thescope of the invention. Various modifications of the invention inaddition to those shown and described herein will become apparent tothose skilled in the art from the foregoing description and fall withinthe scope of the appended claims. The advantages and objects of theinvention are not necessarily encompassed by each embodiment of theinvention.

1. A method of barcoding molecular targets, comprising: identifying Nmolecular targets in a biological sample, wherein the N moleculartargets are immobilized; associating a unique barcode to each moleculartarget via n sequential barcoding rounds (where n≥2), wherein eachbarcoding round comprises m serial hybridizations of probes collectivelybound to the N molecular targets (where m≥2), wherein each serialhybridization comprises: contacting one or more groups of probes to asubset of the N molecular targets, the total number of groups of probescorresponding to the number of molecular targets in the subset, whereinprobes in each group comprise one or more binding sequences specificallytargeting a molecular target in the subset, wherein each probe iscapable of generating at least one detectable visual signal representingbinding of the probe to a molecular target in the subset, and whereinprobes in the one or more groups generate one or more differentdetectable visual signals corresponding to the number of moleculartargets in the subset; detecting the detectable visual signals thatreflect the binding between the one or more groups of probes and thesubset of the N molecular targets; and removing the visual signals, whenapplicable, prior to the next serial hybridization; wherein the uniquebarcode to each molecular target consists of n components, eachcomponent is assigned from S unique symbols, where S is an integer thatequal to or greater than $\sqrt[n]{N};$  and optionally removing probesbetween two barcoding rounds.
 2. The method of claim 1, wherein thedetecting the detectable visual signals comprises: capturing, for eachserial hybridization round, an image of the detectable visual signalsthat reflect the binding between the one or more groups of probes andthe subset of the N molecular targets.
 3. The method of claim 1, furthercomprising: generating, for each barcoding round, a composite image bysuperimposing m images corresponding to the m serial hybridizations,wherein the m images are aligned based on one or more alignmentreferences whose positions remain constant relative to the biologicalsample.
 4. The method of claim 1, further comprising: applying Gaussiananalysis to super-localize the detectable visual signals in an image. 5.The method of claim 1, further comprising: decoding the detectablevisual signals in each composite image based on the unique barcodes forthe N molecular targets and the S unique symbols.
 6. The method of claim1, further comprising: detecting reference visual signals associatedwith the one or more alignment reference.
 7. The method of claim 3,wherein the one or more alignment references comprise one or moreselected from the group consisting of an oligonucleotide sequenceimmobilized on the coverslips and detected by a complementary oligo, acommon sequence embedded in all probes, a microscopic object, a metalbead, a gold bead, a polystyrene bead, a PCR handle sequence on aprimary binding probe, and combinations thereof.
 8. The method of claim1, wherein the n sequential barcoding rounds includes x round for errorcorrection, where x is an integer equal or greater than 1; and whereinassigning unique barcodes for each of N molecular targets requires Sunique symbols, where S is an integer equal or greater than$\sqrt[{n - x}]{N}.$
 9. The method of claim 1, wherein the biologicalsample comprises a tissue sample, a cell sample, a cell extract sample,a nucleic acid sample, an RNA transcript sample, a protein sample, anmRNA sample, DNA molecules, protein molecules, RNA and DNA isoformmolecules, single nucleotide polymorphism molecules, or combinationsthereof.
 10. The method of claim 1, further comprising: determining asecondary molecular target that are associated with the N moleculartargets by contacting the biological sample with molecules specificallybinding to the secondary molecular target.
 11. The method of claim 10,wherein the secondary molecular target comprises one selected from thegroup consisting of a RNA binding protein molecule, ribosome, a DNAbinding protein molecule, a transcription factor, a chromatin bindingprotein, a protein binding molecule, a scaffold protein, andcombinations thereof.
 12. The method of claim 1, wherein probes in theone or more groups of probes further comprise: one or more bindingsequences each specifically targeting one or more sites within amolecular target in the biological sample; and n unique readoutsequences associated with the one or more binding sequences, wherein, ineach barcoding round, only one unique readout sequence is associatedwith a detectable visual signal for a particular molecular target. 13.The method of claim 1, wherein the one or more binding sequences targetmultiple different sites within the same molecular target.
 14. Themethod of claim 1, wherein the one or more binding sequences targetmultiple different sites within different molecular targets.
 15. Themethod of claim 1, wherein each probe comprises one or more of the nunique readout sequences.
 16. The method of claim 15, wherein at leastone of the n unique readout sequences is located in an overhang sequencedirectly connected to the binding sequence of a probe.
 17. The method ofclaim 16, wherein at least one of the n unique readout sequences isindirectly connected to the binding sequence of a probe via one or moreintermediate molecules.
 18. The method of claim 17, wherein the one ormore intermediate molecules comprise an RNA bridge probe, a DNA bridgeprobe, a protein bridge probe, a probe for hybridization chain reaction(HCR), a hairpin nucleic acid probe, an HCR initiator, an HCR polymer,or combinations thereof.
 19. The method of claim 1, wherein the one ormore binding sequences specifically target one or more non-nucleic acidsites in the molecular target, and wherein the n unique readoutsequences comprising nucleic acid sequences that are directly orindirectly connected to the binding sequences of the probes.
 20. Themethod of claim 1, wherein the detectable visual signal is connected tothe binding sequence of a probe or an intermediate molecule via acleavable linker.
 21. The method of claim 1, wherein the one or morebinding sequences comprises a peptide sequence binding to a specificantigen within a particular molecular target, an aptamer, or clickchemistry group.
 22. The method of claim 1, wherein the S unique symbolscomprise colors, numbers, letters, shapes, or combinations thereof. 23.The method of claim 1, wherein, for each serial hybridization, the oneor more groups of probes to a non-overlapping subset of the N moleculartargets.
 24. A method of hybridization analysis of binding betweenlabeled probes and molecular targets in a biological sample, comprising:generating multiple composite images of labeled probes bound to aplurality of molecular targets in the biological sample, wherein eachcomposite image is generated from a plurality of images of labeledprobes collectively bound to the plurality of molecular targets, whereinthe plurality of molecular targets are immobilized within the biologicalsample, and wherein each image of the plurality of images reveals:labeled probes bound to a subset of molecular targets within theplurality of molecular targets, wherein the labeled probes comprise oneor more groups of probes, the total number of groups of probescorresponding to the number of molecular targets in the subset, whereinprobes in each group comprise one or more binding sequences specificallytargeting a molecular target in the subset, and wherein each labeledprobe is capable of generating a visual signal representing binding ofthe probe to a molecular target; and one or more reference targets whosepositions remain constant in the biological sample for aligning theplurality of images.
 25. The method of claim 24, wherein the biologicalsample comprises a tissue sample, a cell sample, a cell extract sample,a nucleic acid sample, an RNA transcript sample, a protein sample, anmRNA sample, DNA molecules, protein molecules, RNA and DNA isoformmolecules, single nucleotide polymorphism molecules, or combinationsthereof.
 26. The method of claim 24, wherein, in each image, thelabelled probes bind to a non-overlapping subset of molecular targetswithin the plurality of molecular targets.
 27. The method of claim 24,further comprising: contacting the one or more groups of probes with thesubset of molecular targets of the plurality of molecular targets;detecting visual signals that reflect the binding between the one ormore groups of probes and molecular targets in the subset; and removingthe visual signals, when applicable, prior to a next round ofhybridization of labeled probes binding to a new subset of moleculartargets within the plurality of molecular targets.
 28. The method ofclaim 24, further comprising: detecting reference visual signalsassociated with the one or more alignment references.
 29. The method ofclaim 24, further comprising: aligning the plurality of images based onthe positions of the one or more alignment references.
 30. A sequentialhybridization method, comprising: identifying a plurality of targetgenes; and associating, via sequential hybridization of binding probesto the plurality of target genes, a first plurality of unique codes withthe plurality of target genes, wherein each target gene in the pluralityof target genes is represented by a unique code in the first pluralityof unique codes, wherein the sequential hybridization comprises n roundsof hybridization (where n≥2), and wherein each round of hybridizationinn rounds of hybridization comprises: contacting the plurality oftarget genes with a plurality of binding probes, wherein each probe inthe plurality of binding probes comprises: a binding sequence thatspecifically binds a target sequence in a gene in the plurality oftarget genes, wherein target genes from the plurality of target genesare spatially transfixed from each other, and wherein each probe iscapable of emitting a detectable visual signal upon binding of the probeto a target sequence; detecting visual signals that reflect the bindingbetween the plurality of binding probes and the plurality of targetgenes; and removing the visual signals, when applicable, prior to thenext round of hybridization; wherein probes used in the n rounds ofhybridization are capable of emitting at least F types of detectablevisual signals (where F≥2 and F^(n) is greater than the number of targetgenes in the plurality of target genes), wherein a unique code in thefirst plurality of unique codes for a target gene consists of ncomponents, wherein each component is determined by visual signals thatreflect the binding between binding probes and the target gene duringone of the n rounds of hybridization, wherein the n rounds ofhybridization include m error correction round (m≥1), wherein a secondplurality of unique codes for the plurality of target genes is generatedafter the m error correction round is removed from the n rounds ofhybridization, and wherein each unique code in the second plurality ofunique codes consists of (n−m) components and uniquely represents atarget gene in the plurality of target genes.
 31. A hybridizationmethod, comprising: identifying a plurality of target genes; performingsequential hybridization of binding probes to the plurality of targetgenes, wherein the sequential hybridization comprises n rounds ofhybridization (where n≥2), and wherein each round of hybridization innrounds of hybridization comprises: contacting the plurality of targetgenes with a plurality of binding probes, wherein each probe in theplurality of binding probes comprises: a binding sequence thatspecifically binds a target sequence in a gene in the plurality oftarget genes, wherein target genes from the plurality of target genesare spatially transfixed from each other, and wherein each probe iscapable of emitting a detectable visual signal upon binding of the probeto a target sequence; detecting visual signals that reflect the bindingbetween the plurality of binding probes and the plurality of targetgenes, wherein each target gene in the plurality of target genes isrepresented by visual signals that are unique for the target gene, andwherein probes used in the n rounds of hybridization are capable ofemitting at least F types of detectable visual signals (where F>2, andF^(n) is greater than the number of target genes in the plurality oftarget genes); and removing the visual signals, when applicable, priorto the next round of hybridization; and performing serial hybridizationsagainst one or more serial target genes, wherein the expression level ofeach serial target gene is above a predetermined threshold value,wherein each serial hybridization comprises: contacting the one or moreserial target genes with a plurality of binding probes, wherein eachprobe in the plurality of binding probes comprises: a binding sequencethat specifically binds a target sequence in a serial target gene in theone or more serial target genes, wherein one or more serial target genesare spatially transfixed from each other, wherein each probe is capableof emitting a detectable visual signal upon binding of the probe to thetarget sequence, and wherein probes binding to target sequences in thesame serial target gene emit the same detectable visual signals; anddetecting visual signals that reflect the binding between the pluralityof binding probes and the one or more serial target gene.
 32. Thesequential hybridization method of claim 30, wherein each component of an-component unique code in the first plurality of unique codes isassigned a numerical value that corresponds to one of the at least Ftypes of detectable visual signals; and wherein at least one componentof the n-component unique code is determined based on the numericalvalues of all or some of the other n−1 components.
 33. The sequentialhybridization method of claim 30, wherein the n-component unique code isdetermined as:{j ₁ ,j ₂, . . . (a ₁ *j ₁ +a ₂ *j ₂ . . . +a _(n) *j _(n) +C)mod F,j_(n)}, wherein j₁ is a numerical value that corresponds the detectablevisual signals used in the first round of hybridization, j₂ is anumerical value that corresponds the detectable visual signals used inthe second round of hybridization, and j_(n) is a numerical value thatcorresponds the detectable visual signals used in the nth round ofhybridization; and wherein j₁, j₂, . . . j_(n), a₁, a₂, . . . a_(n) andn are none zero integers and C is an integer.